Closed yagneshgooglegithub closed 1 week ago
As an open source project, we welcome anyone to contribute, including adding comments you deem necessary. Speaking of the 4 variables in the example: the first variable refers to the overlapping portions between different chunks when splitting documents using a sliding window approach - this concept is extremely common in RAG. The second is the tokenizer used to calculate document length. For the remaining two concepts, "gleaning" refers to the number of times entities are extracted from the same chunk, as mentioned in the GraphRAG paper. The last one is the token threshold that triggers summarization when extracted entities exceed a certain length.
As an open source project, we welcome anyone to contribute, including adding comments you deem necessary. Speaking of the 4 variables in the example: the first variable refers to the overlapping portions between different chunks when splitting documents using a sliding window approach - this concept is extremely common in RAG. The second is the tokenizer used to calculate document length. For the remaining two concepts, "gleaning" refers to the number of times entities are extracted from the same chunk, as mentioned in the GraphRAG paper. The last one is the token threshold that triggers summarization when extracted entities exceed a certain length.
Thanks for the reply. I get the point that it's an open source project. But try to add comments to all the fields of various class if possible. For others, it could take 2 to 3 hours of here and there which authors could do in 2 to 3 seconds. Thanks again, the repo is very easy to use in comparison to microsoft's one.
Most of the private functions and other modules don't have comprehensive docstrings, making it harder to read the code and fit the custom usecase. Please provide those as early as possible.
For example, I am having a hard time understanding many things in the code which could have been easier had I had the comments and docstrings