lapisrocks / LanguageAgentTreeSearch

[ICML 2024] Official repository for "Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models"
https://arxiv.org/abs/2310.04406
MIT License
620 stars 63 forks source link

Potential Bug in evaluate_node and question on self-consistency reward #33

Open sjaelee25 opened 3 weeks ago

sjaelee25 commented 3 weeks ago

Hello, While using the LATS code, I encountered what seems to be a bug and I have a question.

First, in the evaluate_node function, it seems that value prompting should only be performed for children that are not terminal, and a value of 0 should be assigned to terminal children. However, I suspect there is a bug, as shown in the example below. Specifically, it seems that values are being computed for Lookup[slightly soluble in] and Lookup[soluble in alcohol], but these values are then being assigned to an action Finish[alcohol, CdCl2] node as in following example. Could you please check this?

generated actions Cadmium Chloride is slightly soluble in alcohol and is also known as CdCl2. I need to confirm this information and provide the answer. Action 2: Finish[alcohol, CdCl2]

Cadmium Chloride is slightly soluble in alcohol and also known as CdCl2. I need to search for what Cadmium Chloride is slightly soluble in and its other name. Action 2: Finish[alcohol, CdCl2]

Cadmium Chloride is slightly soluble in alcohol and also known as CdCl2. I still need to confirm the solubility in alcohol and search for the other name. Action 2: Lookup[slightly soluble in]

Cadmium chloride is slightly soluble in alcohol and also known as CdCl2. I need to confirm this information before providing the answer. Action 2: Lookup[soluble in alcohol]

Cadmium Chloride is slightly soluble in alcohol and also known as CdCl2. I need to search for what Cadmium Chloride is slightly soluble in and its other name. Action 2: Finish[alcohol, CdCl2]` (bold faced actions are prompted for value prompting)

votes: [0.2, 1.0, 0, 0]

updated values of child nodes Node(depth=2, value=0.20, visits=0, thought=Cadmium Chloride is slightly soluble in alcohol and is also known as CdCl2. I need to confirm this information and provide the answer., action=Finish[alcohol, CdCl2], observation=Episode finished, reward = 0)

Node(depth=2, value=1.00, visits=0, thought=Cadmium Chloride is slightly soluble in alcohol and also known as CdCl2. I need to search for what Cadmium Chloride is slightly soluble in and its other name., action=Finish[alcohol, CdCl2], observation=Episode finished, reward = 0)

Node(depth=2, value=0.00, visits=0, thought=Cadmium Chloride is slightly soluble in alcohol and also known as CdCl2. I still need to confirm the solubility in alcohol and search for the other name., action=Lookup[slightly soluble in], observation=No more results.)

Node(depth=2, value=0.00, visits=0, thought=Cadmium chloride is slightly soluble in alcohol and also known as CdCl2. I need to confirm this information before providing the answer., action=Lookup[soluble in alcohol], observation=(Result 1 / 1) This salt is a hygroscopic solid that is highly soluble in water and slightly soluble in alcohol.)

.

+++ Second, I am unable to identify where in the code the self-consistency reward and the hyperparameter lambda, which is multiplied during the value function calculation, are implemented. Could you please explain this part?

Thanks!