Open ElCuboNegro opened 9 months ago
Thank you for opening this @ElCuboNegro Yes I agree that this is definitely something that would be super good to add. Do you think you'd want to work on a PR for this?
Yes
El 24 feb 2024 7:07 a. m., samlhuillier @.***> escribió:
Thank you for opening this @ElCuboNegrohttps://github.com/ElCuboNegro Yes I agree that this is definitely something that would be super good to add. Do you think you'd want to work on a PR for this?
— Reply to this email directly, view it on GitHubhttps://github.com/reorproject/reor/issues/61#issuecomment-1962341249, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABKEV62DR7VNJZBVAIRSF5LYVHJWFAVCNFSM6AAAAABDOJDEH2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRSGM2DCMRUHE. You are receiving this because you were mentioned.Message ID: @.***>
Do you have any documentation about how are you doing the RAG?
With the aim of implementing something like this https://ai.plainenglish.io/unlocking-whole-dataset-reasoning-why-knowledge-graphs-are-the-future-of-ai-systems-fc8726367808
Any updates on this? I would like to help for a PR.
That would be great @Haze-sh! Which part specifically do you want to work on? Loading PDFs or indexing content from the web?
I was thinking about the PDFs loading, do you suggest a starting point?
To load in PDFs, I guess we'd probably want to build out a couple of separate features in stages:
markdownExtensions
which is used to limit the only files Reor reads to those with markdown extensions. The first thing we'd probably do is add the pdf extension to this. This list of extensions is used to generate the FileInfoList
which is essentially a tree representation of the metadata of each file Reor uses as context. readFile
and read-file
ipc handler functions to have two calls: one for indexing which will use a library like pdf-parse to read the actual text content of the file so that we can index it in the vector database. The ipc handler will probably want to read the pdf file in base64 so that it can be returned to the renderer process and renderered. (Bear in mind that both these calls will basically be an if statement to check the special case for PDF files)openFileByPath
function in use-file-by-filepath.ts
and adding in custom logic for the case where the file extension is a pdf file. The line: editor?.commands.setContent(fileContent);
will not need to be run as it is setting the content for our current TipTap editor which probably won't work with PDF content.Let me know if you have any other questions! Very happy to help :)
Hello, is there any progress on loading PDF?
It might be really useful to directly add things to the context collection through URLs or directly passing files (pdf's) into the vault. Maybe I can help with that shenanigans.