shreevatsa / chaya

0 stars 0 forks source link

Break apart lines #19

Open shreevatsa opened 2 months ago

shreevatsa commented 2 months ago

Sometimes a line can span multiple lines, and right now there's no way to insert a line break within a line.

image

We could either:

shreevatsa commented 2 months ago

Ouch, notice this giant "line" (because of the X):

image

(Google OCR result for the same page as previous comment; the screenshot in previous comment was from Tesseract)

shreevatsa commented 2 months ago

Now that we have individual words in line.attrs, we could in principle re-form lines. (What to do with changed text though?)

shreevatsa commented 2 months ago

This line is because of the various dots I think:

image

So instead of something operating on line.attrs.words, may be better to just implement spliting manually (#8).

(Also because we may get rid of line.attrs.words: #21.)