shreevatsa / chaya

0 stars 0 forks source link

Get started #1

Closed shreevatsa closed 5 months ago

shreevatsa commented 6 months ago
shreevatsa commented 6 months ago

Probably the PM model should be the "container", and hold the PDF bounding boxes and also the corresponding text.

PM's "view" will take care of rendering the PDF page (bounding box) images, and the OCRed text.

shreevatsa commented 6 months ago

So schema something like:

Wondering what to do with individual words. In case of Google OCR I have some logic for splitting into lines, maybe standardize on words?

shreevatsa commented 6 months ago

As an initial step / getting feet wet, let's just turn the page into a PM-controlled thing with each page individually.

Can use:

as reference.

shreevatsa commented 5 months ago

This was basically a "Get started" issue; renamed and closing now. The ProseMirror schema / document model will evolve as work on this code continues; it's not something that can be declared "done".