mukulpatnaik / researchgpt

A LLM based research assistant that allows you to have a conversation with a research paper
https://www.dara.chat
MIT License
3.55k stars 340 forks source link

Increasing maximum paper length to not run into AttributeError: 'DataFrame' object has no attribute 'embeddings' #43

Closed Spoot1RH closed 1 year ago

Spoot1RH commented 1 year ago

I noticed when putting a pdf with about 30 pages into the program I get the Error: AttributeError: 'DataFrame' object has no attribute 'embeddings'. I think it would be useful that for the time being that the program warns you that the text is too long and cannot be read in.

I understand that the API has a token limit but also a memory, would it be possible to expand the code to splice up longer pdfs so that it can be fed to the api bit by bit in the background.

Further I would like to know what amount of pages on average is a safe number to input for the current build, what are your experiences?

mukulpatnaik commented 1 year ago

Hi @Spoot1RH I have made a major refactor of the code, fixing many issues and moving from flask to fastapi, if you are still interested please consider running git fetch to update your code to the latest version and try following the steps in the README to run the app. Thanks for trying it!

Consider using the online webapp www.dara.chat, it consists of many optimizations that are not yet public and can handle large pdfs well. Let me know if you're still having issues!