for i in range(number_of_pages):
...
for t in page_text:
"""do something on processed_text"""
...
paper_text += processed_text
In this line, paper_text add processed_text to it during each iteration in page_text, but processed_text seems to grow to include the text of the whole page in the end. There seems to be a lot of duplicates.
When I change the codes to the following, the duplicates can be removed in paper_text.
for i in range(number_of_pages):
...
for t in page_text:
"""do something on processed_text"""
...
paper_text += processed_text
https://github.com/mukulpatnaik/researchgpt/blob/102439378a184e792afb7c1f9ecada68ddca55f6/main-local.py#L64
In this line,
paper_text
addprocessed_text
to it during each iteration in page_text, but processed_text seems to grow to include the text of the whole page in the end. There seems to be a lot of duplicates.When I change the codes to the following, the duplicates can be removed in paper_text.