xavctn / img2table

img2table is a table identification and extraction Python Library for PDF and images, based on OpenCV image processing
MIT License
524 stars 74 forks source link

can not extract table #178

Open sanjivjha opened 6 months ago

sanjivjha commented 6 months ago

KG-DWN-98-2 Audited EOY-2013-14.pdf Kindly look at page 7 table. It is not able to exract table info. I am using tesseract. Kindly advice.

huyfififi commented 1 month ago

This library simply uses the results of OCR (Optical Character Recognition) you specify, so I bet the issue lies in the OCR configuration. Have you tried other OCR tools, such as DocTR?

huyfififi commented 1 month ago

DocTR gives me better (not good tho) results than Tesseract

Tesseract

      0     1     2
0  None  None  None
1  None  None  None
2  None  None  None
3  None  None  None
4  None  None  None
5  None  None  None
      0     1
0  None  None
1  None  None
2  None  None
3  None  None
      0     1     2
0  None  None  None
1  None  None  None
2  None  None  None
3  None  None  None
4  None  None  None
5  None  None  None

DocTR

                                                   0                                                  1
0  BLOCK KG DWN 98/2 VENTURE\n(Operator Oil & Nat...  BLOCK KG DWN 98/2 VENTURE\n(Operator Oil & Nat...
1                                        PARTICULARS                                      AMOUNT IN INR
2  TOTAL CONTRIBUTION RECEIVED TILL 31.03.2014 & ...                                         2124443890
3          TOTAL PENDING CASH CALLS AS ON 31.03.2014                                                NIL
                                                   0                                                1                                                2
0    Schedule 7(a) : Cash Call Contribution - PIB-BV  Schedule 7(a) : Cash Call Contribution - PIB-BV  Schedule 7(a) : Cash Call Contribution - PIB-BV
1                                        PARTICULARS                                    AMOUNT IN INR                                    AMOUNT IN USD
2  TOTAL CASH CALL CONTRIBUTION RECEIVED TILL 30....                                        602260812                                         14942722
3                                  CASH CALL PENDING                                             None                                             None
4                                                NIL                                                0                                                0
5           TOTAL CASH CALL PENDING AS ON 30.03.2014                                                0                                                0
                                                   0                                              1                                              2
0      Schedule 8(a) : Cash Call Contribution - HOEI  Schedule 8(a) : Cash Call Contribution - HOEI  Schedule 8(a) : Cash Call Contribution - HOEI
1                                        PARTICULARS                                  AMOUNT IN INR                                  AMOUNT IN USD
2  TOTAL CASH CALL CONTRIBUTION RECEIVED TILL 31....                                      367418583                                        9110606
3                                  CASH CALL PENDING                                           None                                           None
4                                                NIL                                           None                                           None
5           TOTAL CASH CALL PENDING AS ON 31.03.2014                                              0                                              0