naptha / tesseract.js

Pure Javascript OCR for more than 100 Languages 📖🎉🖥
http://tesseract.projectnaptha.com/
Apache License 2.0
34.91k stars 2.21k forks source link

setImage is re-run unnecessarily when rotateAuto is enabled #892

Closed Balearica closed 7 months ago

Balearica commented 7 months ago

Detecting page angle currently requires that automatic page segmentation is enabled. Therefore, if rotateAuto is set to true but the current PSM does not support detecting the page angle, page segmentation is run with PSM set to 3 (AUTO), the page angle is retrieved, and then page segmentation is run a second time with PSM set to whatever is requested by the user.

If PSM is set to 3 by the user already, then page segmentation should only be run once (if no rotation is detected). However, due to a bug in the implementation of this feature, page segmentation is currently being run twice, even in this case.

https://github.com/naptha/tesseract.js/blob/master/src/worker-script/index.js#L402-L407

Specifically, this code does not account for the fact that the PSM object stores the PSM values as strings, but api.GetPageSegMode returns an integer. Therefore, the check that occurs is ['3'].includes(3) which resolves to false.