shreevatsa / chaya

0 stars 0 forks source link

Two-column mode / OCR on manually specified regions #8

Open shreevatsa opened 5 months ago

shreevatsa commented 5 months ago

Will likely need this

shreevatsa commented 5 months ago

The reason for starting with OCR instead of a blank editor was to avoid having to deal with splitting already added pages. But it may be useful to instead add one page at a time (could choose OCR option on a per-page basis). Would also be better than firing off hundreds of requests to the Google Vision API.

shreevatsa commented 4 months ago

Possible UI:

shreevatsa commented 4 months ago

Example of a modal div: https://chatgpt.com/c/fd4eae00-9c84-4e0d-bb82-06bc9873207a modal.zip

shreevatsa commented 4 months ago

After chunk split (either LR or UD) is decided, we need to:

  1. Delete the original chunk from the editor,
  2. Run OCR on the new regions, get lines out of the results, and chunks out of the lines,
  3. Add back these lines and chunks (with updated xmin and xmax) to the doc.

Currently the schema already has lines' xmin and xmax, so in principle no schema changes are needed, although maintaining words is a bit annoying (#21).

We need some code refactoring to be able to do (2) easily.