mukulpatnaik / researchgpt

A LLM based research assistant that allows you to have a conversation with a research paper
https://www.dara.chat
MIT License
3.55k stars 340 forks source link

why you validating a length of 30 characters #44

Closed estebance closed 1 year ago

estebance commented 1 year ago

Hi, just want to clarify what is the purpose of this block

    for row in pdf:
        if len(row['text']) < 30:
            continue
        filtered_pdf.append(row)

Why the criteria is 30 characters ?

I'd like to contribute to the project, but first I need to understand a little bit about the implementation

mukulpatnaik commented 1 year ago

Hi sorry for the late response, the 30 characters is to ignore subheadings and captions on images and other tiny pieces of text that may not be relevant