lavague-ai / LaVague

Large Action Model framework to develop AI Web Agents
https://docs.lavague.ai/en/latest/
Apache License 2.0
5.5k stars 506 forks source link

Added unique Xpath retriever #574

Closed dhuynh95 closed 3 months ago

dhuynh95 commented 3 months ago

I noticed that sometimes on some websites we retrieve elements that are visible but have exactly the same bounding box (aka they are overlapping?), which can introduce noise to the LLM.

Here is an example without removing duplicates:

image

Here is an example with unique elements based on bounding boxes:

image

We can see we have almost 10 times less tokens after removing duplicated elements for the same highlighted elements!

@adeprez : note this PR requires extract_xpaths_from_html that I introduced in #573