maitrix-org / llm-reasoners

A library for advanced large language model reasoning
https://www.llm-reasoners.net/
Apache License 2.0
1.37k stars 109 forks source link

Getting 0 Accuracy/Errors with DFS+Blocksworld+GPT3.5 Turbo #100

Open sumedhpendurkar opened 1 week ago

sumedhpendurkar commented 1 week ago

I was able to setup the code. I also write a small code snippet (happy to send a PR) to implement ``get_loglikelihood'' function with openAI's API (using top_logprobs, and logprobs argument). However, if I run ToT DFS example provided with depth=2 I get 0 accuracy on blocksworld. I used GPT3.5 Turbo to test this.

Is this normal or am I missing something? (seems something is wrong given results in your paper). I have attached the results.log to this issue. result.log

Note, I also see this message (which seems concerning): /bin/sh: 1: None/validate: not found

I tried disabling prior (by just setting the default in dfs search as False), but ended up getting the error

File "llm-reasoners/reasoners/algorithm/dfs.py", line 128, in dfs new_node.reward, new_node.reward_details = config.reward(cur_state, action, aux, fast_reward_details) File "/llm-reasoners/examples/ToT/blocksworld/tot_inference.py", line 97, in reward intuition, self_eval = kwargs['intuition'], kwargs['self_eval'] KeyError: 'intuition'

Ber666 commented 1 week ago

Hi,

Could you try following the README at https://github.com/maitrix-org/llm-reasoners/tree/main/examples/CoT/blocksworld to set up the validate tool?

sumedhpendurkar commented 1 week ago

Thanks, that fixed problem (1). Still having issues when I set prior = False with DFS.

Ber666 commented 1 week ago

Could you explain your motivation for setting prior=False? If I understand correctly, this will make the DFS totally random and meaningless?

sumedhpendurkar commented 5 hours ago

I just wanted to evaluate how good is random selection or how much does LLM help to guide the search. I.e. study if there are any cases where LLM is confidently wrong (leads to search in bad spaces).