Open raindropsfromsky opened 4 years ago
Related to #165 and the discussion there.
So, from the explanation you gave in the other issue, here's what I infer:
"textify" = Text extraction. This process is done on a file that already has searchable text. Since the text is already there, Qiqqa only assigns coordinates to each word.
"OCR" = Tesseract-based OCR. This process is done if the page has scanned image, and not machine-searchable text. After that, Qiqqa assigns coordinates to each word.
Please confirm?
Correct.
Nitpick: "textify" = extracting both the words and the coordinates.
No textify done, then there's nothing, just a file (which happens to be a PDF) and an (empty) metadata record in the library database.
HTH
Qiqqa design is definitely inspired by a ransom note, which is composed from words cut out from newspapers and magazines. :D
The status line often says "x pages to textify and y pages to OCR". But this peculiar word "textify" is not explained anywhere! It is not an industry-standard word used in any particular business.
Therefore, the Qiqqa manual and help file at website must explain this word, and how it affects the performance of Qiqqa (search results, and also the "save pdf as text" function).