princeton-nlp / SWE-agent

SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It solves 12.47% of bugs in the SWE-bench evaluation set and takes just 1 minute to run.
https://princeton-nlp.github.io/SWE-agent/
MIT License
13.26k stars 1.3k forks source link

Binary output causes `communicate` crash #697

Closed hojmax closed 2 weeks ago

hojmax commented 1 month ago

Describe the bug

If the agent tries to cat out the content of a word file (.docx), then line buffer.decode() fails, and the program crashes.

Steps/commands/code to Reproduce

You can reproduce this by using a repo with a .docx file, and instructing the model to cat the file. I.e. like this:

'text://Use cat to read the content of /home/agent/resources/cffe0e32-c9a6-4c52-9877-78ceb4aaa9fb.docx. Do not use tools like docx2text, read the raw bytes.',

Error message/results

INFO     🎬 ACTION (primary)                                                                            
             cat /home/agent/resources/cffe0e32-c9a6-4c52-9877-78ceb4aaa9fb.docx                            

ERROR    Read with timeout failed on input:                                                             
         ---                                                                                            
         cat /home/agent/resources/cffe0e32-c9a6-4c52-9877-78ceb4aaa9fb.docx                            

         ---                                                                                            
ERROR    Unknown exception                                                                              
         Traceback (most recent call last):                                                             
           File                                                                                         
         "/Users/axelhojmark/Desktop/agent-scaling-laws/metr-standard/workbench/agent-scaffolding/sweage
         nt/environment/swe_env.py", line 195, in step                                                  
             observation = self.communicate(input=action, timeout_duration=25)                          
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                          
           File                                                                                         
         "/Users/axelhojmark/Desktop/agent-scaling-laws/metr-standard/workbench/agent-scaffolding/sweage
         nt/environment/swe_env.py", line 422, in communicate                                           
             output = self._communicate(                                                                
                      ^^^^^^^^^^^^^^^^^^                                                                
           File                                                                                         
         "/Users/axelhojmark/Desktop/agent-scaling-laws/metr-standard/workbench/agent-scaffolding/sweage
         nt/environment/swe_env.py", line 367, in _communicate                                          
             return self._communicate_experimental(input, timeout_duration)                             
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                             
           File                                                                                         
         "/Users/axelhojmark/Desktop/agent-scaling-laws/metr-standard/workbench/agent-scaffolding/sweage
         nt/environment/swe_env.py", line 332, in _communicate_experimental                             
             buffer, exit_code = read_with_timeout_experimental(self.container, timeout_duration)       
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^       
           File                                                                                         
         "/Users/axelhojmark/Desktop/agent-scaling-laws/metr-standard/workbench/agent-scaffolding/sweage
         nt/environment/utils.py", line 189, in read_with_timeout_experimental                          
             if PROCESS_DONE_MARKER_START in buffer.decode():                                           
                                             ^^^^^^^^^^^^^^^                                            
         UnicodeDecodeError: 'utf-8' codec can't decode byte 0x91 in position 15: invalid start byte    
ERROR    Read with timeout failed on input:                                                             
         ---                                                                                            
         /bin/bash -n <<'EOF'                                                                           
         state                                                                                          
         EOF                                                                                            

         ---                                                                                            
Traceback (most recent call last):
  File "/Users/axelhojmark/Desktop/agent-scaling-laws/metr-standard/workbench/agent-scaffolding/sweagent/main.py", line 175, in main
    self.run(index)
  File "/Users/axelhojmark/Desktop/agent-scaling-laws/metr-standard/workbench/agent-scaffolding/sweagent/main.py", line 159, in run
    info, trajectory = self.agent.run(
                       ^^^^^^^^^^^^^^^
  File "/Users/axelhojmark/Desktop/agent-scaling-laws/metr-standard/workbench/agent-scaffolding/sweagent/agent/agents.py", line 746, in run
    state = env.communicate(self.state_command) if self.state_command else None
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/axelhojmark/Desktop/agent-scaling-laws/metr-standard/workbench/agent-scaffolding/sweagent/environment/swe_env.py", line 419, in communicate
    output, valid = self._check_syntax(input)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/axelhojmark/Desktop/agent-scaling-laws/metr-standard/workbench/agent-scaffolding/sweagent/environment/swe_env.py", line 400, in _check_syntax
    output = self._communicate(f"/bin/bash -n <<'EOF'\n{input}\nEOF\n")
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/axelhojmark/Desktop/agent-scaling-laws/metr-standard/workbench/agent-scaffolding/sweagent/environment/swe_env.py", line 367, in _communicate
    return self._communicate_experimental(input, timeout_duration)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/axelhojmark/Desktop/agent-scaling-laws/metr-standard/workbench/agent-scaffolding/sweagent/environment/swe_env.py", line 332, in _communicate_experimental
    buffer, exit_code = read_with_timeout_experimental(self.container, timeout_duration)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/axelhojmark/Desktop/agent-scaling-laws/metr-standard/workbench/agent-scaffolding/sweagent/environment/utils.py", line 189, in read_with_timeout_experimental
    if PROCESS_DONE_MARKER_START in buffer.decode():
                                    ^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xad in position 0: invalid start byte
WARNING  ❌ Failed on a3adc6: 'utf-8' codec can't decode byte 0xad in position 0: invalid start byte    

System Information

MacOS M2

Checklist

klieret commented 1 month ago

Thanks for reporting this. Are you on the latest version? In https://github.com/princeton-nlp/SWE-agent/pull/683 I added a little bit more leniency to the decoding (but that might not help here). docx files are basically binary files, so that's what trips up the decoding here (cat-ing a docx file is a bad idea anyway, because the LM won't make sense of it either I guess, but still)

klieret commented 1 month ago

For what it's worth, if you switch to the other communication strategy export SWE_AGENT_COMMUNICATE_METHOD="processes", see here, you might get around this issue. But the agent will be a lot slower...

hojmax commented 1 month ago

I was indeed not on the latest version, but #683 does not seem to handle this:

INFO     🎬 ACTION (primary)                                                                                                                                    
         cat /home/agent/resources/cffe0e32-c9a6-4c52-9877-78ceb4aaa9fb.docx                                                                                    

ERROR    Read with timeout failed on input:                                                                                                                     
         ---                                                                                                                                                    
         cat /home/agent/resources/cffe0e32-c9a6-4c52-9877-78ceb4aaa9fb.docx                                                                                    

         ---                                                                                                                                                    
ERROR    Unknown exception                                                                                                                                      
         Traceback (most recent call last):                                                                                                                     
           File "/Users/axelhojmark/Desktop/agent-scaling-laws/metr-standard/workbench/agent-scaffolding/sweagent/environment/swe_env.py", line 195, in step    
             observation = self.communicate(input=action, timeout_duration=25)                                                                                  
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                  
           File "/Users/axelhojmark/Desktop/agent-scaling-laws/metr-standard/workbench/agent-scaffolding/sweagent/environment/swe_env.py", line 422, in         
         communicate                                                                                                                                            
             output = self._communicate(                                                                                                                        
                      ^^^^^^^^^^^^^^^^^^                                                                                                                        
           File "/Users/axelhojmark/Desktop/agent-scaling-laws/metr-standard/workbench/agent-scaffolding/sweagent/environment/swe_env.py", line 367, in         
         _communicate                                                                                                                                           
             return self._communicate_experimental(input, timeout_duration)                                                                                     
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                     
           File "/Users/axelhojmark/Desktop/agent-scaling-laws/metr-standard/workbench/agent-scaffolding/sweagent/environment/swe_env.py", line 332, in         
         _communicate_experimental                                                                                                                              
             buffer, exit_code = read_with_timeout_experimental(self.container, timeout_duration)                                                               
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                               
           File "/Users/axelhojmark/Desktop/agent-scaling-laws/metr-standard/workbench/agent-scaffolding/sweagent/environment/utils.py", line 197, in           
         read_with_timeout_experimental                                                                                                                         
             if PROCESS_DONE_MARKER_START in decoded:                                                                                                           
                                             ^^^^^^^                                                                                                            
         UnboundLocalError: cannot access local variable 'decoded' where it is not associated with a value                                                      
ERROR    Read with timeout failed on input:                                                                                                                     
         ---                                                                                                                                                    
         /bin/bash -n <<'EOF'                                                                                                                                   
         state                                                                                                                                                  
         EOF                                                                                                                                                    

         ---                                                                                                                                                    
Traceback (most recent call last):
  File "/Users/axelhojmark/Desktop/agent-scaling-laws/metr-standard/workbench/agent-scaffolding/sweagent/main.py", line 175, in main
    self.run(index)
  File "/Users/axelhojmark/Desktop/agent-scaling-laws/metr-standard/workbench/agent-scaffolding/sweagent/main.py", line 159, in run
    info, trajectory = self.agent.run(
                       ^^^^^^^^^^^^^^^
  File "/Users/axelhojmark/Desktop/agent-scaling-laws/metr-standard/workbench/agent-scaffolding/sweagent/agent/agents.py", line 746, in run
    state = env.communicate(self.state_command) if self.state_command else None
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/axelhojmark/Desktop/agent-scaling-laws/metr-standard/workbench/agent-scaffolding/sweagent/environment/swe_env.py", line 419, in communicate
    output, valid = self._check_syntax(input)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/axelhojmark/Desktop/agent-scaling-laws/metr-standard/workbench/agent-scaffolding/sweagent/environment/swe_env.py", line 400, in _check_syntax
    output = self._communicate(f"/bin/bash -n <<'EOF'\n{input}\nEOF\n")
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/axelhojmark/Desktop/agent-scaling-laws/metr-standard/workbench/agent-scaffolding/sweagent/environment/swe_env.py", line 367, in _communicate
    return self._communicate_experimental(input, timeout_duration)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/axelhojmark/Desktop/agent-scaling-laws/metr-standard/workbench/agent-scaffolding/sweagent/environment/swe_env.py", line 332, in _communicate_experimental
    buffer, exit_code = read_with_timeout_experimental(self.container, timeout_duration)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/axelhojmark/Desktop/agent-scaling-laws/metr-standard/workbench/agent-scaffolding/sweagent/environment/utils.py", line 197, in read_with_timeout_experimental
    if PROCESS_DONE_MARKER_START in decoded:
                                    ^^^^^^^
UnboundLocalError: cannot access local variable 'decoded' where it is not associated with a value
WARNING  ❌ Failed on a3adc6: cannot access local variable 'decoded' where it is not associated with a value         

Now the problem is that in this part of utils.py:

buffer += data
try:
    decoded = buffer.decode()
except UnicodeDecodeError:
    n_decode_failures += 1
    if n_decode_failures > 30:
        msg = "Too many decode failures while reading from subprocess."
        raise RuntimeError(msg)
if PROCESS_DONE_MARKER_START in decoded:
    break

When the exception is thrown, the variable decoded has not been set.

klieret commented 1 month ago

Oh, that's a bug! My bad! Will fix that ASAP (unless you open a PR in the next h --> would love to give you the credit for it ;) )

Edit: Done ✅