Open pgaviganHC opened 1 year ago
Based on my understanding, Tesseract (the current OCR library) excels for documents while CRAFT (the model mentioned in the link) is for text-detection in more complicated images. They work well in conjunction
Another option is EasyOCR which does not require any external dependencies (Issue #4). It uses CRAFT for text-detection and then its own OCR engine so it's essentially Tesseract+CRAFT in a single library, but potentially less powerful due to it being more light-weight
Based on my understanding, Tesseract (the current OCR library) excels for documents while CRAFT (the model mentioned in the link) is for text-detection in more complicated images. They work well in conjunction
Another option is EasyOCR which does not require any external dependencies (Issue #4). It uses CRAFT for text-detection and then its own OCR engine so it's essentially Tesseract+CRAFT in a single library, but potentially less powerful due to it being more light-weight
Good insight here, thanks. I wonder if we could quantify the performance difference between these options with a simple test of some sort?
EasyOCR has been recommended to us by Microsoft for use in MS Fabric (the current Tesseract implementation cannot be installed in the MS Fabric environment)
Need to do a security check on EasyOCR
It may be worth trying some alternative OCR libraries, as discussed in this article: https://www.statcan.gc.ca/en/data-science/network/character-recognition
Might be a good idea to have these alternatives be available options in this library.