Open Balearica opened 1 day ago
Upon a brief review, it looks like setting the default parameters here may have served a couple different purposes in the past.
user_defined_dpi
to 300
, which is not a default behavior.I now am fairly confident that this can be cut without consequence, so will do so.
If we cut the settings discussed above, the only thing left in the defaultParams.js file is the tessjs_create_hocr
/tessjs_create_tsv
/etc. settings that were depreciated multiple major releases ago. Therefore, we should be able to cut that entire file. The only thing to confirm is that the default output formats stay the same before/after, as otherwise this would be a breaking change.
The
createWorker
config
argument allows for setting parameters prior to initialization. While this function was originally added to support a handful of init-only parameters (notablyload_system_dawg
,load_number_dawg
, andload_punc_dawg
), it should be able to support all parameters, and there is nothing in the documentation to indicate it only supports specific parameters.However, at present, any settings provided in this
config
argument that conflict with the default parameters defined in defaultParams.js are overwritten by the defaults. It looks like this only impactstessedit_pageseg_mode
andtessedit_char_whitelist
, as these are the only Tesseract parameters in the defaults file.https://github.com/naptha/tesseract.js/blob/a936162d92b03bc04f51c4bfb5db14e588209838/src/worker-script/index.js#L308-L309
I will investigate the commit history before making a change, however I currently believe the code that sets the default Tesseract parameters can be cut entirely. Both values we are setting are already the defaults for the Tesseract API, so it's unclear why we are setting them manually.