hasu / notdeft

NotDeft note manager for Emacs
https://tero.hasu.is/notdeft/
170 stars 14 forks source link

extracting and indexing PDF is slow. Solution pdftotxt possible? #14

Open poulpoulsen opened 4 years ago

poulpoulsen commented 4 years ago

Hello, i use your tool, because it is very fast and stable. But unfortunately PDF indexing is very slow out of the box. Extracting with Pdftotxt or similar might be faster. Is it possible?

Regards Poul

hasu commented 4 years ago

As NotDeft is a tool for managing plain text notes, and not documents in general, I don't think it's really the remit of NotDeft to do that. There are desktop search tools like Recoll that are able to index the text from PDFs and several other document formats.

I wouldn't index PDFs with NotDeft, not directly, since you'll end up with a lot of garbage in your search index that way, and such large files probably also slow down filtering in `notdeft-mode' buffers.

I agree that using something like pdftotext could work quite well in turning select PDF documents into plain text note files for purposes of annotation and linking to other notes in your collection.

You could for example implement an Emacs command for making such PDF importing convenient. That command would perhaps first invoke your chosen tool for extracting the text, and then call the `notdeft-create-file' function to import the text as a note.