Issue with adding SOM annotated image observation

iMeanAI / WebCanvas

Connect agents to live web environments evaluation.

https://www.imean.ai/web-canvas

MIT License

180 stars 9 forks source link

Issue with adding SOM annotated image observation #23

Open vardaan123 opened 4 weeks ago

vardaan123 commented 4 weeks ago

Hi, thanks for the great work! I have been working to add a SOM annotated image as additional observation input. I need to make sure that the accessibility/DOM tree and SOM element IDs are aligned. I have tried the following approaches:

While constructing the DOM tree in build_dom_tree() https://github.com/iMeanAI/WebCanvas/blob/main/agent/Environment/html_env/build_tree.py#L207, I obtain the xpath of each element, get the playwright element using page.locator(xpath={}) and then obtain the bounding box of that element. However, it is very slow and inefficient. It takes ~10 mins to run it each time.
I can get the bounding boxes of all elements using JS code (like in VisualWebArena https://github.com/web-arena-x/visualwebarena/blob/b56b6d821e0b0f926fb940a7efe7d3f1246eab36/browser_env/processors.py#L809) however, the IDs need to be mapped to the ones in your codebase for correspondence b/w element IDs in SOM and DOM tree. I tried to do the mapping using xpath but Mind2web-Live has a different xpath/selector format than that is available in standard playwright JS.

Kindly let me know if you have any suggestions to implement this. Eagerly Looking forward to your response!

han032206 commented 3 weeks ago

Hi, thank you for bringing up this issue! We are currently working on implementing a unique identifier attribute for each element. This unique identifier will allow for mapping between the elements in the WebCanvas accessibility tree and VisualWebArena. By using this identifier, developers should be able to accurately locate and correspond elements across different environments without relying solely on XPaths or selectors.

We're in the process of implementing this solution, and we'll keep you updated on the progress. If you encounter any other issues or have further feedback towards this functionality, please feel free to reach out.

vardaan123 commented 3 weeks ago

Sounds good, looking forward to your implementation!

han032206 commented 3 weeks ago

Sounds good, looking forward to your implementation!

Hi Vardaan,

We have committed this implementation in the exp branch. Each element is now assigned a unique ID. Additionally, we have implemented two functions: one to find elements directly by ID(locate_by_id) and another to get selectors by ID(get_selector_by_id). You can use these functions flexibly to locate the element and execute actions.

If you encounter any issues or have further feedback, please feel free to reach out!

vardaan123 commented 3 weeks ago

Thanks, let me try that.

vardaan123 commented 2 weeks ago

Hi, thanks for implementing locate_by_id() and get_selector_by_id(). However, this unique id does not correspond to the node ids in the JS code used by VWA to obtain bounding boxes. I need to efficiently obtain the bounding box coordinates for all elements in the tree.

I think the reason may be because you are using document.querySelectorAll('*') but their code uses a different subset of interactable elements https://github.com/web-arena-x/visualwebarena/blob/b56b6d821e0b0f926fb940a7efe7d3f1246eab36/browser_env/processors.py#L827

han032206 commented 2 weeks ago

Hi, thanks for implementing locate_by_id() and get_selector_by_id(). However, this unique id does not correspond to the node ids in the JS code used by VWA to obtain bounding boxes. I need to efficiently obtain the bounding box coordinates for all elements in the tree.

I think the reason may be because you are using document.querySelectorAll('*') but their code uses a different subset of interactable elements https://github.com/web-arena-x/visualwebarena/blob/b56b6d821e0b0f926fb940a7efe7d3f1246eab36/browser_env/processors.py#L827

Hi Vardaan, what we implemented is adding an attribute with a unique ID to each element without filtering them. I think you can take this unique ID attribute to apply VWA's method for filtering elements and calculating the bounding boxes. With this unique ID, you can then locate the corresponding elements or selectors in the WebCanvas accessibility tree. If I'm mistaken in my understanding, please let me know.

vardaan123 commented 2 weeks ago

Sounds good, thanks. Let me try that.