`RA.retrieve`: AttributeError: 'NoneType' object has no attribute 'encode'"

younes-io commented 6 months ago

I asked this:

question = "What is the topic described in Article 202 ?"

answer = RA.retrieve(question, collapse_tree=True)

print("Answer: ", answer)

and got this:

{
    "name": "AttributeError",
    "message": "'NoneType' object has no attribute 'encode'",
    "stack": "---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[10], line 4
      1 question = \"What is the topic described in Article 202?\"
      3 # answer = RA.answer_question(question=question)
----> 4 answer = RA.retrieve(question, collapse_tree=True)
      6 print(\"Answer: \", answer)

File /workspaces/aider_repos/raptor/RetrievalAugmentation.py:250, in RetrievalAugmentation.retrieve(self, question, start_layer, num_layers, max_tokens, collapse_tree, return_layer_information)
    245 if self.retriever is None:
    246     raise ValueError(
    247         \"The TreeRetriever instance has not been initialized. Call 'add_documents' first.\"
    248     )
--> 250 return self.retriever.retrieve(
    251     question,
    252     start_layer,
    253     num_layers,
    254     max_tokens,
    255     collapse_tree,
    256     return_layer_information,
    257 )

File /workspaces/aider_repos/raptor/tree_retriever.py:293, in TreeRetriever.retrieve(self, query, start_layer, num_layers, max_tokens, collapse_tree, return_layer_information)
    291 if collapse_tree:
    292     logging.info(f\"Using collapsed_tree\")
--> 293     selected_nodes, context = self.retrieve_information_collapse_tree(
    294         query, max_tokens
    295     )
    296 else:
    297     layer_nodes = self.tree.layer_to_nodes[start_layer]

File /workspaces/aider_repos/raptor/tree_retriever.py:176, in TreeRetriever.retrieve_information_collapse_tree(self, query, max_tokens)
    174 for idx in indices:
    175     node = node_list[idx]
--> 176     node_tokens = len(self.tokenizer.encode(node.text))
    178     if total_tokens + node_tokens > max_tokens:
    179         break

AttributeError: 'NoneType' object has no attribute 'encode'"
}

parthsarthi03 commented 6 months ago

Thanks for catching this! I've replicated the issue, we'll be pushing a fix for this soon. In the meantime, you can set collapse_tree=False and it should work. Also, if you want to answer question and not just retrieve the context, you should be using RA.answer_question(question=question).

younes-io commented 6 months ago

@parthsarthi03 : yes, I already tried the answer_question and it works. I want to test/try and eventually use RAPTOR as a retriever =) Thank you

younes-io commented 6 months ago

I tried the collapse_tree=False, but it doesn't answer correctly. I hope collapse_tree=True will be able to provide the entire space to the retriever to discover the correct answer (it's a difficult question, actually; not the one above)

parthsarthi03 commented 6 months ago

This should be fixed with https://github.com/parthsarthi03/raptor/commit/81b0c95b80ccb3576b0a70a2a5207c7a899d1b4f.

parthsarthi03 commented 6 months ago

https://github.com/parthsarthi03/raptor/pull/12 has added top_k support for collapse_true and sets collapse_true to True by default. There are two extra parameters that you can change: top_k and max_tokens.

top_k controls the number of top nodes to consider when retrieving information, defaulting to 10. max_tokens limits the maximum number of tokens in the retrieved context, defaulting to 3500, ensuring the context stays within the specified token limit.

Example usage:

context, __ = RA.retrieve(question, top_k=10, max_tokens=3500)
answer = RA.answer_question(question, top_k=10, max_tokens=3500)

parthsarthi03 commented 6 months ago

Closing this issue for now. If you have any further questions or encounter additional issues, please feel free to reopen it.

younes-io commented 6 months ago

thank you @parthsarthi03

parthsarthi03 / raptor

`RA.retrieve`: AttributeError: 'NoneType' object has no attribute 'encode'" #9