mukulpatnaik / researchgpt

A LLM based research assistant that allows you to have a conversation with a research paper
https://www.dara.chat
MIT License
3.55k stars 340 forks source link

Is there a bug in Line64 of parse_paper #46

Closed SunGaofeng closed 1 year ago

SunGaofeng commented 1 year ago

https://github.com/mukulpatnaik/researchgpt/blob/102439378a184e792afb7c1f9ecada68ddca55f6/main-local.py#L64

for i in range(number_of_pages):
    ...
   for t in page_text:
       """do something on processed_text"""
       ...
       paper_text += processed_text 

In this line, paper_text add processed_text to it during each iteration in page_text, but processed_text seems to grow to include the text of the whole page in the end. There seems to be a lot of duplicates.

When I change the codes to the following, the duplicates can be removed in paper_text.

for i in range(number_of_pages):
    ...
   for t in page_text:
       """do something on processed_text"""
       ...
   paper_text += processed_text 
mukulpatnaik commented 1 year ago

Thanks @SunGaofeng I have fixed this now.