princeton-nlp / SWE-agent

SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It solves 12.47% of bugs in the SWE-bench evaluation set and takes just 1 minute to run.
https://princeton-nlp.github.io/SWE-agent/
MIT License
13.25k stars 1.3k forks source link

Speed up communication with docker container bash #149

Closed ofirpress closed 2 weeks ago

ofirpress commented 5 months ago

Describe the bug

@klieret brought this up and I wanted to open an issue about this.

When running a replay or when using the 'human' model, the agent seems to run slow, much slower than it should. Are we missing something here? Is there some inefficiency we could improve on?

Steps/commands/code to Reproduce

Run a replay or open a session with the --model set to human

Error message/results

The agent seems to run slow

System Information

mac m2

Checklist

klieret commented 5 months ago

This doesn't affect just replay/human model, this certainly affects all runs.

Here's a small snippet to collect information:

_LAST_TIME = datetime.datetime.now()

def debug_time(name=""):
    """Display time delta at line number"""
    cf = currentframe()
    line_no = cf.f_back.f_lineno
    global _LAST_TIME
    time_delta = datetime.datetime.now() - _LAST_TIME
    _LAST_TIME = datetime.datetime.now()
    logger.debug(f"Time: {time_delta} at line {line_no} ({name=})") 

then just add debug_time() in the code at various places.

klieret commented 5 months ago
INFO     🎬 ACTION (primary)
         ls
DEBUG    Time: 0:00:00.003451 at line 674 (name='reset before postproc')
DEBUG    Time: 0:00:00.000472 at line 677 (name='guard multiline')
DEBUG    Time: 0:00:01.638950 at line 692 (name='got observation')
INFO     Saved trajectory to trajectories/fuchur/human__klieret__swe-agent-test-repo__default_from_url__t-0.00__p-0.95__c-3.00__install-1/klieret__swe-agent-test-repo-i1.traj
DEBUG    Time: 0:00:00.007787 at line 706 (name='traj saved')
DEBUG    Time: 0:00:00.001074 at line 667 (name='')
DEBUG    Time: 0:00:01.698880 at line 669 (name='state cmd')
DEBUG    Time: 0:00:00.003122 at line 390 (name='')
DEBUG    Time: 0:00:00.002390 at line 411 (name='')
INFO     🤖 MODEL INPUT
         README.md
         tests

         (Open file: n/a)
         (Current directory: /klieret__swe-agent-test-repo)
         bash-$

so it's this (1.6s) (which probably runs the command)

for sub_action in self.split_actions(run_action):
      if sub_action['agent'] == self.name or sub_action['cmd_name'] == self.config.submit_command:
          obs, _, done, info = env.step(sub_action['action'])
          observations.append(obs)
          if sub_action['cmd_name'] == self.config.submit_command:
              done = True
          if done:
              break
      else:
          agent_name = sub_action['agent']
          sub_agent_output = self.call_subroutine(agent_name, sub_action, env)
          observations.append(sub_agent_output)

and (~1.7s)

state = env.communicate(self.state_command) if self.state_command else None

the latter is just

    state_command: Command = Command(
        name="state",
        code="""state() {
            echo '{"working_dir": "'$(realpath --relative-to=$ROOT/.. $PWD)'"}';
        };""",
    )

So tl;dr, I think it's just that env.communicate of even the most trivial commands takes at least 1.5s

RudolphSikorskiy commented 5 months ago

performance in docker on apple m2 can be very slow if you don't use native apple virtualization and rosetta. Rancher desktop provide it but it turned off in default settings

klieret commented 4 months ago

Branch with timing information: swe-env-timing-tests

klieret commented 4 months ago

An alternative way based on end markers was tested in speedup-abort-condition, but apparently this was tried before and resulted in rare issues of incomplete communication, so it would need some very complete testing before proceeding.