Open liaopeiyuan opened 2 weeks ago
Do you have a script to replicate this easily? When we get our SoM agents to click on a dropdown it seems to be captured in the screenshot outputs:
so is not an issue that we've faced so far. In the screenshot above there's some problems with the SoM not covering the items but that's more of a limitation of our SoM implementation specifically.
We are using the AMI ami-080f6d73cfce497a1
on @shuyanzhou's 2.0 branch and locally a Mac to run the tests.
We ran run_classifieds_som.sh
with rendering and could not see the dropdown rendered in the SoM screenshots. We also manually dumped screenshots within the code and were unable to get it to display.
It does seem that "Sort By" is indeed a <select>
HTML element as well. I wonder if there are compatibility issues with certain client versions of Playwright on different platforms that make it unable to render the selects. Would you be running the evaluation on a Linux/Windows client? Is the browser headless or headful?
We currently have the server up via a public IP and tested a third-party service (https://www.browserbase.com/) and were unable to render the dropdown as well. What would be a good email to reach out to? If you're interested, we can send you the IP address.
Playwright is unable to render
<select>
elements as they are OS-native. This would cause any vision-based computer control agent (e.g. Claude 3.5 Sonnet) to be unable to interact with theCategory
element as it simply would not render.I also attached a short recording of a Chromium session to illustrate my point.
https://github.com/user-attachments/assets/be7ae5cc-39a2-4821-b973-2ac0947c44b2
Might this be a limitation of the benchmark? This is not really a limitation of a computer-control agent but rather of the evaluation harness. One modification could be to update the source code so that all
Category
selections are renderable by the browser.Alternatively, I may be missing some tricks to allow a headful browser to render it, but such techniques are not known to me.
Any guidance is appreciated!