Text Extractor FREEDOM - Episode 1 "Fight all biases."

AlexDeMoura commented 1 year ago

Description of the new feature / enhancement

Preferred language option has a < none > and recognizes the "text" at the character level with no corrections or syntax recognition.

Scenario when this would be used?

So it can recognize, for example, Excel formulas, Math equations, Chemical formulas, OCR pack not installed stuff, etc. At this moment, it can't even extract the numbers and formula functions of this image:

Supporting information

It is EASY to implement this. Please, NO need for a Math/Scientific pack or any other pack to be installed!!! The plain < none > shall be enough. Character level recognition ONLY.

crutkas commented 1 year ago

Do you have the English language ocr language installed? /needinfo

AlexDeMoura commented 1 year ago

Yes, I have - this is not the problem.

TheJoeFin commented 1 year ago

Currently Text Extractor uses the Windows OCR API which does not have an option for a "None" language. Currently Text Grab has the ability to also use Tesseract and I am experimenting on how to bring that functionality over to Text Extractor as well. Part of the issue is making it easy to install, update, and manage languages.

For now I will close this issue as a duplicate of #20899 because a different OCR Engine would be required to implement this feature.

microsoft / PowerToys