Closed ofirpress closed 2 weeks ago
This doesn't affect just replay/human model, this certainly affects all runs.
Here's a small snippet to collect information:
_LAST_TIME = datetime.datetime.now()
def debug_time(name=""):
"""Display time delta at line number"""
cf = currentframe()
line_no = cf.f_back.f_lineno
global _LAST_TIME
time_delta = datetime.datetime.now() - _LAST_TIME
_LAST_TIME = datetime.datetime.now()
logger.debug(f"Time: {time_delta} at line {line_no} ({name=})")
then just add debug_time()
in the code at various places.
INFO 🎬 ACTION (primary)
ls
DEBUG Time: 0:00:00.003451 at line 674 (name='reset before postproc')
DEBUG Time: 0:00:00.000472 at line 677 (name='guard multiline')
DEBUG Time: 0:00:01.638950 at line 692 (name='got observation')
INFO Saved trajectory to trajectories/fuchur/human__klieret__swe-agent-test-repo__default_from_url__t-0.00__p-0.95__c-3.00__install-1/klieret__swe-agent-test-repo-i1.traj
DEBUG Time: 0:00:00.007787 at line 706 (name='traj saved')
DEBUG Time: 0:00:00.001074 at line 667 (name='')
DEBUG Time: 0:00:01.698880 at line 669 (name='state cmd')
DEBUG Time: 0:00:00.003122 at line 390 (name='')
DEBUG Time: 0:00:00.002390 at line 411 (name='')
INFO 🤖 MODEL INPUT
README.md
tests
(Open file: n/a)
(Current directory: /klieret__swe-agent-test-repo)
bash-$
so it's this (1.6s) (which probably runs the command)
for sub_action in self.split_actions(run_action):
if sub_action['agent'] == self.name or sub_action['cmd_name'] == self.config.submit_command:
obs, _, done, info = env.step(sub_action['action'])
observations.append(obs)
if sub_action['cmd_name'] == self.config.submit_command:
done = True
if done:
break
else:
agent_name = sub_action['agent']
sub_agent_output = self.call_subroutine(agent_name, sub_action, env)
observations.append(sub_agent_output)
and (~1.7s)
state = env.communicate(self.state_command) if self.state_command else None
the latter is just
state_command: Command = Command(
name="state",
code="""state() {
echo '{"working_dir": "'$(realpath --relative-to=$ROOT/.. $PWD)'"}';
};""",
)
So tl;dr, I think it's just that env.communicate
of even the most trivial commands takes at least 1.5s
performance in docker on apple m2 can be very slow if you don't use native apple virtualization and rosetta. Rancher desktop provide it but it turned off in default settings
Branch with timing information: swe-env-timing-tests
An alternative way based on end markers was tested in speedup-abort-condition
, but apparently this was tried before and resulted in rare issues of incomplete communication, so it would need some very complete testing before proceeding.
Describe the bug
@klieret brought this up and I wanted to open an issue about this.
When running a replay or when using the 'human' model, the agent seems to run slow, much slower than it should. Are we missing something here? Is there some inefficiency we could improve on?
Steps/commands/code to Reproduce
Run a replay or open a session with the --model set to human
Error message/results
The agent seems to run slow
System Information
mac m2
Checklist