scambier / obsidian-text-extractor

A (companion) plugin to facilitate the extraction of text from images (OCR) and PDFs.
GNU General Public License v3.0
346 stars 19 forks source link

[Feature request] Add romanian language support #46

Closed PruteanuVlad closed 1 year ago

PruteanuVlad commented 1 year ago

Hello from Romania! Would greatly appreciate it if you added support for romanian, or provide a framework for users to do so.

scambier commented 1 year ago

Hello, it looks like "ron" is the language you should select in the list for Romanian :)

https://github.com/naptha/tesseract.js/blob/4970ceaabbbe10eacef1f201182e23b9b07a8c35/src/constants/languages.js#L82

PruteanuVlad commented 1 year ago

Thank you for the quick response and sorry for not catching that. I saw that option but a superficial Google search showed me some unrelated results. Thank you once again!

scambier commented 1 year ago

By the way, I just found the proper language reference: https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html