Open zimengzhou1 opened 3 months ago
Absolutely! This is something that would be great to add, particularly supporting plain text & supporting pdfs. Would you be keen to add this?
Sure, I'll have a crack at it!
I did a bit of poking around and it seems supporting pdfs is significantly harder than I thought, more so after reading this article.
I tried using the library "pdf-parse" to extract text from the pdfs, but after testing out the parser with several research papers, it was clear the chunks stored in the vector database and shown in the "Related Notes" were poorly formatted (especially equations and tables) and not very relevant.
Supporting plain text was pretty trivial, I just added ".txt" as an allowed extension.
What are our thoughts on supporting other document types other than markdown, for example PDF or plaintext? Also it would be nice if users could directly use other note taking apps like notion as a source of their data, it would provide a lower barrier to entry to using reor.