shreevatsa / chaya

0 stars 0 forks source link

OCR alternatives #2

Closed shreevatsa closed 5 months ago

shreevatsa commented 6 months ago

Option to use Google OCR (bring your own key) instead of Tesseract.js

shreevatsa commented 5 months ago

Also, if using Tesseract, then specifying language/script: https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html

shreevatsa commented 5 months ago

I think between the changes in googleocr branch, and the UI changes not yet committed to laptop, this will soon be in good shape.

Currently it OCRs images by searching through the nodes, but what we should instead do is take pages from the global pages array. Each node's text can be replaced as it is OCRed.

shreevatsa commented 5 months ago

Fixed with 1df32229a84fb5d14a0b294c387d996569047907.