reworkd / tarsier

Vision utilities for web interaction agents 👀
https://reworkd.ai
MIT License
1.28k stars 66 forks source link

Extract web text directly instead of OCR #51

Open eshoyuan opened 5 months ago

eshoyuan commented 5 months ago

I'm working on something pretty similar to what you guys are doing and had a thought. Why not grab text directly from the web instead of using OCR? Langchain and llamaindex both have such tools, and there are also some repos about converting html to markdown.

Just a thought. Would love to know what you think!

will-holley commented 3 months ago

Seconding the ask for a Motivations section that discusses when to use this in lieu of parsing the DOM.

asim-shrestha commented 2 months ago

Thats a good question. Would be curious to see their approaches and performance is like.

For us, it's very important to contain as much of the visual structure of the page as possible. This includes positions of the text on the 2D plane. Using just the HTML and skipping the actual rendering of the page, you lose a lot of this information. We need this because a) we want our agents to reason about and take actions on the page just as we would, and b) because visibility of elements on screen is required for automation frameworks to actually take actions (you cannot "click" on elements that don't actually appear on the page)

For example, suppose you had a scrollable container element containing 10 child elements total, with 5 elements overflowing and requiring scrolling the parent container to view. I would imagine the other approaches would display the overflowed elements in the ultimate representation, while we want to avoid doing this (Because if an agent were to try and click on these elements, it would cause an element_not_found error)

Hope this makes sense, happy to elaborate further @eshoyuan. (And apologies for the late response) If @will-holley or anyone wants to add this to the README, happy to take a PR!

tvatter commented 1 week ago

I agree with the motivation. One issue I see with the approach is when there are images embedded in a webpage that contain text but are not really actionable by the LLM. E.g., https://app.sequence-erp.com/login, the only thing that matters are the login elements on the left hand side, but the OCR algo fails at recognizing this:

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
sequence-erp.com
                         sequence
                                                                                                                                                                                       sequence                              Recherche   par    mot    - clé dans     " Active     section     "
                                                                                                                                                                                    Dashboard                       Ventes                                          Achats                                          Clients   avec   factu
                                                                                                                                                                                                                 Total   factures      impayées   15,000.00       CHF                  Total   factures       impayées   20,000.00       CHF                   Les   5  avec     le  plus   de   volu
                                                                                                                                                                                 &     Mon       espace                   En   temps                  En    retard                En   temps                 En    retard                     Client
                                                                                                                                                                                                                 5,000.00    CHF             10,000.00    CHF              10,000.00      CHF         10,000.00  CHF                      Green       Line
                                                                                                                                                                                    Projets                                                                                                                        Helio
                     Français                                                                                                                                                                                           Banking
                                                                                                                                                                                    Ventes                        Jours       depuis       la   dernière    importation      bancaire                      22    jours       ( 01/10/2022     )
                      Bonjour                       Sequencer                               !                                                                                                                            Dernière     période    importée                                          01    / 09 / 2022-01    / 10  / 2022
                                                                                                                                                                                    Achats                         Transactions    en    attente     de   réconciliation                             16   transactions
                    Vous          nous       manquiez           déjà        !                                                                                                                       Ressources   humaines                                                                                                         FOL
                                                                                                                                                                                                                 Profit &   Loss                                                                                  Les   !
                                                                                                                                                                                                                 6 derniers       mois                                                                                       Clients
                                                                                                                                                                                    Banking                                                                                                                   Four        Les     5   avec
                     Email        *
                                                                                                                                                                                    Comptabilité                                                                                                                          Client
                          [  #   0   ]
                                                                                                                                                                                    Rapports                                                                                                                            GR  Greem
                                                                                                                                                                                                                                                   I.              1.            I.
                                                                                                                                                                                                                                                                                                                           Helio
                     Mot        de   passe                    [ 1  ]      Mot    de       passe    oublié         ?
                                                                                                                                                                                                                                                                                                                          Sata  A
                          [  #   2    ] ]                                             [ $    3   ]                                                                                                                                Ventes                                                                                          Clie
                                                                                                                                                                                                                 6 derniers       mois                                                                                               M   &   L
                                                                                                                                                                                                                                                                                                                    MA  Masa
                                         [ $   4   ]    Se   connecter
                                                                                                                                                                                                                                                                                                                       Clients
                                                                                                                                                                                                                                                                                                                       attente
                     Pas    de       compte     sur      Sequence           ?    Créer       mon      compte                                                                                                                                                                                                                                     Mois   en     co
                                                                                                                                                                                                                                                 4.8     /  5    sur       Google          Avis     G
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------