reworkd / tarsier

Vision utilities for web interaction agents 👀
https://reworkd.ai
MIT License
1.23k stars 61 forks source link

[Enhancement] Two Phase Tagging for Webpage OCR #89

Open ml5ah opened 1 week ago

ml5ah commented 1 week ago

Currently, tagging annotations are overlayed on the webpage causing the 2 to overlap and increase the frequency of OCR errors.

There is a 2 phase tagging approach being discussed that makes 2 OCR calls -- one for the original webpage and one for the tagging annotations ONLY.

Keeping the dimensions of both images same, the goal is to consolidate the OCR text from 2 sources for better accuracy.

@asim-shrestha @awtkns

asim-shrestha commented 6 days ago

Some work being done here in #94