Closed YAmikep closed 9 months ago
Hello, The analysis of documents with dark background is not supported in the current release. However, I have worked on it and will be available in the next release which is coming soon.
I just published the new release. Seems to work OK now, but I had to use PaddleOCR because Tesseract wasn't reading the text properly. You might have to use the library only to get the table location/cells (ie providing no ocr to the function) and match it with results from a better OCR software
Nice! 💪
You might have to use the library only to get the table location/cells (ie providing no ocr to the function) and match it with results from a better OCR software
How would you do that? I did not look at the code source but it sounds that img2table does not use the OCR to detect the table then, is that correct?
When passing the ocr
parameters, doesn't it use the OCR on every detected cell to resolve the content? Isn't it the same as using the OCR after using img2table?
I did not look at the code source but it sounds that img2table does not use the OCR to detect the table then, is that correct?
I might not have been super clear. Basically, there are 2 steps :
What I was saying is that you have the possibility to :
ocr
parameter and retrieve tables as well as their cell coordinatesThis enables you to use another OCR solution or to applied some image processing tailored to your type of documents before passing the image to the OCR in order to get better results.
Don't know if it's clear ^^
I see. Thanks, that makes sense.
Does it support a black background and borderless like in the image below?
I am trying to extract the table from some bank statements to help with tax prep but it returns nothing. Am I doing something wrong or it just does not support these types of tables? If not supported, any advice on how to transform this image first to make it work? Thanks.
Versions