princeton-nlp / tree-of-thought-llm

[NeurIPS 2023] Tree of Thoughts: Deliberate Problem Solving with Large Language Models
https://arxiv.org/abs/2305.10601
MIT License
4.56k stars 426 forks source link

KeyError: 'r_word' when the model is gpt-3.5-turbo in crosswords #19

Closed li-aolong closed 1 year ago

li-aolong commented 1 year ago

When I use gpt-3.5-turbo to run crosswords/standard_sampling.sh, if the task_index is 6, there will be an error:

Traceback (most recent call last):                                                                                                                                                                                                                                             
  File "/home/yhli/codes/tree-of-thought-llm/run.py", line 174, in <module>                                                                                                                                                                                                    
    run(args)                                                                                                                                                                                                                                                                  
  File "/home/yhli/codes/tree-of-thought-llm/run.py", line 117, in run                                                                                                                                                                                                         
    infos = [task.test_output(i, y) for y in ys]                                                                                                                                                                                                                               
  File "/home/yhli/codes/tree-of-thought-llm/run.py", line 117, in <listcomp>                                                                                                                                                                                                  
    infos = [task.test_output(i, y) for y in ys]                                                                                                                                                                                                                               
  File "/home/yhli/codes/tree-of-thought-llm/tasks/crosswords.py", line 204, in test_output                                                                                                                                                                                    
    info['r'] = info['r_word']                                                                                                                                                                                                                                                 
KeyError: 'r_word'

I find this is due to the output result with a note from gpt-3.5-turbo, such as:

G R A S P
E X T E N
S T A I N
A W E S T
K A R S T

Note: There can be multiple correct outputs for the same input as long as the words are valid and fit in the crossword grid.

This makes the last 5 lines as the final result due to the codes in tasks/crosswords.py, where output.strip().split('\n')[-5:] takes the last 5 lines.

def test_output(self, idx: int, output: str):
    self.env.reset(idx)
    output = output.split('Output:\n')[-1]
    info = {'r_word': 0, 'r_letter': 0, 'r_game': 0}
    for i, line in enumerate(output.strip().split('\n')[-5:], 1):
        letters = line.split(' ')[:5]
        word = ''.join(letters)
        word = word + '_' * (5 - len(word))
        action = f'h{i}. {word}'
        # print(action)
        _, _, _, info = self.env.step(action)
    info['r'] = info['r_word']
    return info

Thus, the info variable is {} due to the function setp(self, action), where

if len(word) != 5:
    return 'Invalid! Word should have 5 letters.', 0, False, {}

So, I turn output.strip().split('\n')[-5:] to output.strip().split('\n')[:5], and add a if statement as follows:

def test_output(self, idx: int, output: str):
    self.env.reset(idx)
    output = output.split('Output:\n')[-1]
    info = {'r_word': 0, 'r_letter': 0, 'r_game': 0}
    for i, line in enumerate(output.strip().split('\n')[:5], 1):
        letters = line.split(' ')[:5]
        word = ''.join(letters)
        word = word + '_' * (5 - len(word))
        action = f'h{i}. {word}'
        # print(action)
        _, _, _, info = self.env.step(action)
        if info == {}:
            info = {'r_word': 0, 'r_letter': 0, 'r_game': 0}
    info['r'] = info['r_word']
    return info

Is that ok?

ysymyth commented 1 year ago

thanks for raising the issue! yes it should be fine --- I think in general gpt-3.5 is weaker at following format constraints, so probably there might be more chances of exceptions.