reworkd / tarsier

Vision utilities for web interaction agents 👀
https://reworkd.ai
MIT License
1.41k stars 85 forks source link

👀 Integrate with Amazon Textextract #6

Open awtkns opened 11 months ago

awtkns commented 11 months ago

Currently the only OCR service tarsier supports is GoogleOCR vision. It would be good to provide another ocr service that allows textextract to be used

shubhamofbce commented 9 months ago

I think we need this asap, because google vision is not working as expected for any complex website. I am working on this.

awtkns commented 9 months ago

@shubhamofbce let me know if you need support!

plamb-viso commented 5 months ago

bump; very interested in testing this library out using textract output

asim-shrestha commented 5 months ago

@plamb-viso happy to take a PR! It should be fairly straightforward as we have this somewhat abstracted.

We'd also really like to test out Azure OCR as we've heard its the most performant. (Will make a separate issue for this)

asim-shrestha commented 5 months ago

And any luck @shubhamofbce ?

shubhamofbce commented 5 months ago

@asim-shrestha Sorry I have not update. I looked into it long back, it was straight forward but didn't get a chance to complete it and create a PR and now I don't have that with me.

asim-shrestha commented 5 months ago

No worries @shubhamofbce , did you still want to tackle this?

shubhamofbce commented 4 months ago

Sorry, but I will not be able to work on it due to time constraint. @asim-shrestha

Loeing commented 4 months ago

I think I should be able to tackle this next week

awtkns commented 3 months ago

Hey @Loeing let me know if you you need any support on this one.

Loeing commented 3 months ago

@awtkns sorry this past week has been busier than anticipated. Have been playing around with Tarsier. Should be able to make some progress by the end of next week

tvatter commented 3 months ago

@Loeing I'm super interested in the ability to integrate with Amazon Textextract. Have you made any progress on this? Is there any chance I can be of some assistance?

mscully4 commented 2 months ago

Howdy! I pulled down the code and tried my hand at integrating with AWS Textract. I ran into a small problem, Textract only returns normalized geometry data (values between 0 and 1), which differs from GCP & Azure. This seems to cause an issue with this line of the format_text method, which checks spacing between annotations using 10 pixels as its baseline. Since the data is normalized, everything gets squished onto one line in the output. De-normalizing the data (multiplying the normalized values by the height/width of the image) fixed the issue and produced correct looking output. The question I have is: would you rather I just de-normalize the Textract response data or should the format_text function be updated to only operation using normalized values?

philipbjorge commented 4 weeks ago

Does anyone have any published WIP branches available to look at? Thanks