web-arena-x / webarena

Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"
https://webarena.dev
Apache License 2.0
647 stars 94 forks source link

Environment is very slow due to playwright and inspect.stack() #66

Closed wookayin closed 1 month ago

wookayin commented 8 months ago

I find the environment frustratingly slow, it takes around 10 seconds for a single step transition or just calling env.reset() once.

Profiling tells us

image

A most of the time is spend on fetch_page_accessibility_tree, more precisely self.get_bounding_client_rect:

https://github.com/web-arena-x/webarena/blob/main/browser_env/processors.py#L394-L396

and the root cause is playwright:

setattr(task, "__pw_stack__", inspect.stack())
setattr(task, "__pw_stack_trace__", traceback.extract_stack())

Note that this is called for EVERY node in the DOM tree, and very inefficient. Although it doesn't much make sense to me that stacktrace information needs to be used in playwright's sync_base API, have you run into this before? Would there be any workaround or known solution to make fetch_page_accessibility_tree more efficient?

shuyanzhou commented 8 months ago

Hi @wookayin, I wasn't able to reproduce the slowness from my end. I printed the time spent before and after env.step in this script and the output is:

Time taken: 4.67 seconds Time taken: 3.58 seconds Time taken: 3.39 seconds

Could it be a problem with the local machine? Feel free to follow up with more information.

wookayin commented 8 months ago

Yes that script also runs very slow for me. I don't think 4 seconds per environment step is a fast enough speed.

Can you please try the following?

$ pip install py-spy

$ py-spy record python scripts/collect_obs.py

You will be able to see some profiling results; if you can share that with me it'd be helpful. My guess is that given your timing information it's taking a lot of tome there too. I am attaching one for your reference; inspect.stack() takes most of the time in my environments.

In my environments, playwright==1.32.1 and python=3.11.5.

webarena-slow-env-GH-66

https://github.com/web-arena-x/webarena/assets/1009873/c24c297a-3fd8-4117-97ff-2dbdb9337119

shuyanzhou commented 7 months ago

Super sorry for the late reply. Here is the output from my end with playwright==1.32.1 and python==3.10.12` python-2023-11-28T10:33:58-05:00

https://github.com/web-arena-x/webarena/assets/29911200/658863ce-bedc-4200-8ee9-84265dbd3c0d

wookayin commented 7 months ago

So quite a lot of time is also spent on yours too, extracting the stack frame very redundantly with the exact same reason.

shuyanzhou commented 7 months ago

Although I am not experiencing significant slowness in terms of the wall-clock time (each fetch_accessibility_tree took 2 to 3 secs.), I will look into the problem.

Basically, the current implementation to get the bounding box of each element might comprise the observation rendering efficiency. Although this is so far the most accurate way to get the bounding box I can think of. If you have any ideas, feel free to follow up.

cc. @frankxu2004 Thoughts?

wookayin commented 7 months ago

Depending on the performance of a machine, it can be significantly slow. Even in your environments 2~3 steps/sec doesn't sound like a reasonable speed -- website interaction is very fast and a vast majority of time is wasted on "stacktrace" management to emulate coroutine in a synchronized fashion. It appears that synchronous playwright APIs are meant to be used mainly for debugging or some other prototyping purposes, not for the main use because of its poor performance.

Actually this is a problem of the underlying library microsoft/playwright. I think in principle one can avoid doing by using asynchronous APIs instead of synchornous APIs (AsyncScriptBrowserEnv). Webarena will also benefit a lot by avoiding synchronous APIs, but user applications (e.g. implementing some agents in a research project) will need to make non-trivial efforts to fully migrate to the asynchronous APIs.

neubig commented 7 months ago

Hi @wookayin ! This is not a major blocker for us so we might not be able to spend a lot of time on it now, but we'd welcome a PR that makes things faster. Your approach seems reasonable.

wookayin commented 7 months ago

Yes I agree with you and understand that! Thanks for your messages and help. If I can come up with some good workaround or improvements, I will be happy to contribute back.

shuyanzhou commented 1 month ago

Throwback response -- BrowserGym did a very nice implementation on this by injecting JS instead of using client callings.

I attempted to incorporate it into our codebase but found the observation difference made our previous results not reproducible. We decided to keep our current implementation, but feel free to check it out if it is still interesting to you.