qurator-spk / eynollah

Document Layout Analysis
Apache License 2.0
328 stars 26 forks source link

No segmentation results for specific image - (due to detecting 6 columns when there is only 1?) #83

Closed sjscotti closed 1 year ago

sjscotti commented 1 year ago

Hi I have run over 2000 images through eynollah as a OCR-D processor, but only 1 gave me this problem. There was no error detected, but the mets.xml file has no segmentation results. The only thing I know is that the image was detected as having 6 columns when there is only 1 actual column. The data for this case is below. Thanks in advance!

Image processed attached...

bqwndyazflxxtnmszruzzmlyofrxvtzc_s089_1561992285623

OCR-D eynollah command and output to console...

(qurator) D:\qurator>ocrd-eynollah-segment -I OCR-D-IMG -O OCR-D-IMG-SEG -P models eynollah/models_eynollah -P dpi 360 -P allow_scaling true
10:32:03.361 INFO eynollah - INPUT FILE P_00738 (1/1)

10:32:03.809 INFO eynollah - Resizing and enhancing image...
10:32:03.809 INFO eynollah - Detected 360 DPI
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 1s 646ms/step
10:32:11.003 INFO eynollah - Found 6 columns ([[0.09252959 0.01236904 0.0052445  0.05544147 0.01741231 0.8170032 ]])
10:32:11.003 INFO eynollah - Image was not enhanced.
1/1 [==============================] - 1s 732ms/step
1/1 [==============================] - 1s 628ms/step
10:32:15.064 INFO eynollah - Found 6 columns ([[0.09252959 0.01236904 0.0052445  0.05544147 0.01741231 0.8170032 ]])
1/1 [==============================] - 1s 833ms/step
1/1 [==============================] - 0s 16ms/step
1/1 [==============================] - 0s 22ms/step

...
NOTE: many similar lines
...

1/1 [==============================] - 0s 24ms/step
1/1 [==============================] - 0s 24ms/step
1/1 [==============================] - 0s 24ms/step
1/1 [==============================] - 0s 28ms/step
10:34:32.242 INFO eynollah - Textregion detection took 114.8s
1/1 [==============================] - 1s 733ms/step
10:34:35.931 INFO eynollah - Graphics detection took 3.7s
1/1 [==============================] - 1s 769ms/step
1/1 [==============================] - 0s 16ms/step
1/1 [==============================] - 0s 31ms/step

...
NOTE: many similar lines
...

1/1 [==============================] - 0s 16ms/step
1/1 [==============================] - 0s 16ms/step
1/1 [==============================] - 0s 31ms/step
1/1 [==============================] - 0s 22ms/step
10:34:59.506 INFO eynollah - textline detection took 23.6s
10:36:15.179 INFO eynollah - slope_deskew: -90.0
10:36:15.179 INFO eynollah - deskewing took 75.7s
10:36:15.332 INFO eynollah - detection of marginals took 0.2s
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 0s 16ms/step
1/1 [==============================] - 0s 16ms/step

...
NOTE: many similar lines
...

1/1 [==============================] - 0s 22ms/step
1/1 [==============================] - 0s 31ms/step
1/1 [==============================] - 1s 764ms/step
10:39:20.846 INFO eynollah - Job done in 437.0s
10:39:21.100 INFO ocrd.process.profile - Executing processor 'ocrd-eynollah-segment' took 437.728415s (wall) 813.187500s (CPU)( [--input-file-grp='OCR-D-IMG' --output-file-grp='OCR-D-IMG-SEG' --parameter='{"models": "eynollah/models_eynollah", "dpi": 360, "allow_scaling": true, "full_layout": true, "curved_line": false, "headers_off": false}' --page-id='']
10:39:21.100 INFO ocrd.workspace.save_mets - Saving mets 'D:\qurator\mets.xml'

and below is the contents of the mets.xml file...

<?xml version="1.0" encoding="UTF-8"?>
<mets:mets xmlns:mets="http://www.loc.gov/METS/" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="info:lc/xmlns/premis-v2 http://www.loc.gov/standards/premis/v2/premis-v2-0.xsd http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-6.xsd http://www.loc.gov/METS/ http://www.loc.gov/standards/mets/mets.xsd http://www.loc.gov/mix/v10 http://www.loc.gov/standards/mix/mix10/mix10.xsd">
  <mets:metsHdr CREATEDATE="2022-07-22T10:29:16.958375">
    <mets:agent TYPE="OTHER" OTHERTYPE="SOFTWARE" ROLE="CREATOR">
      <mets:name>ocrd/core v2.34.0</mets:name>
    </mets:agent>
    <mets:agent TYPE="OTHER" OTHERTYPE="SOFTWARE" ROLE="OTHER" OTHERROLE="layout/segmentation/region">
      <mets:name>ocrd-eynollah-segment v0.0.11</mets:name>
      <mets:note xmlns:ocrd="https://ocr-d.de" ocrd:option="input-file-grp">OCR-D-IMG</mets:note>
      <mets:note xmlns:ocrd="https://ocr-d.de" ocrd:option="output-file-grp">OCR-D-IMG-SEG</mets:note>
      <mets:note xmlns:ocrd="https://ocr-d.de" ocrd:option="parameter">{"models": "eynollah/models_eynollah", "dpi": 360, "allow_scaling": true, "full_layout": true, "curved_line": false, "headers_off": false}</mets:note>
      <mets:note xmlns:ocrd="https://ocr-d.de" ocrd:option="page-id"/>
    </mets:agent>
  </mets:metsHdr>
  <mets:dmdSec ID="DMDLOG_0001">
    <mets:mdWrap MDTYPE="MODS">
      <mets:xmlData>
        <mods:mods xmlns:mods="http://www.loc.gov/mods/v3">
          <mods:identifier type="purl">'test'</mods:identifier>
        </mods:mods>
      </mets:xmlData>
    </mets:mdWrap>
  </mets:dmdSec>
  <mets:amdSec ID="AMD">
    </mets:amdSec>
  <mets:fileSec>
    <mets:fileGrp USE="OCR-D-IMG">
      <mets:file ID="OCR-D-IMG_00738" MIMETYPE="image/png">
        <mets:FLocat LOCTYPE="OTHER" OTHERLOCTYPE="FILE" xlink:href="OCR-D-IMG\bqwndyazflxxtnmszruzzmlyofrxvtzc_s089_1561992285623.png"/>
      </mets:file>
    </mets:fileGrp>
    <mets:fileGrp USE="OCR-D-IMG-SEG">
      <mets:file ID="OCR-D-IMG-SEG_00738" MIMETYPE="application/vnd.prima.page+xml">
        <mets:FLocat LOCTYPE="OTHER" OTHERLOCTYPE="FILE" xlink:href="OCR-D-IMG-SEG\OCR-D-IMG-SEG_00738.xml"/>
      </mets:file>
    </mets:fileGrp>
  </mets:fileSec>
  <mets:structMap TYPE="PHYSICAL">
    <mets:div TYPE="physSequence">
      <mets:div TYPE="page" ID="P_00738">
        <mets:fptr FILEID="OCR-D-IMG_00738"/>
        <mets:fptr FILEID="OCR-D-IMG-SEG_00738"/>
      </mets:div>
    </mets:div>
  </mets:structMap>
</mets:mets>
vahidrezanezhad commented 1 year ago

Hi I have run over 2000 images through eynollah as a OCR-D processor, but only 1 gave me this problem. There was no error detected, but the mets.xml file has no segmentation results. The only thing I know is that the image was detected as having 6 columns when there is only 1 actual column. The data for this case is below. Thanks in advance!

Image processed attached...

bqwndyazflxxtnmszruzzmlyofrxvtzc_s089_1561992285623

OCR-D eynollah command and output to console...

(qurator) D:\qurator>ocrd-eynollah-segment -I OCR-D-IMG -O OCR-D-IMG-SEG -P models eynollah/models_eynollah -P dpi 360 -P allow_scaling true
10:32:03.361 INFO eynollah - INPUT FILE P_00738 (1/1)

10:32:03.809 INFO eynollah - Resizing and enhancing image...
10:32:03.809 INFO eynollah - Detected 360 DPI
1/1 [==============================] - 3s 3s/step
1/1 [==============================] - 1s 646ms/step
10:32:11.003 INFO eynollah - Found 6 columns ([[0.09252959 0.01236904 0.0052445  0.05544147 0.01741231 0.8170032 ]])
10:32:11.003 INFO eynollah - Image was not enhanced.
1/1 [==============================] - 1s 732ms/step
1/1 [==============================] - 1s 628ms/step
10:32:15.064 INFO eynollah - Found 6 columns ([[0.09252959 0.01236904 0.0052445  0.05544147 0.01741231 0.8170032 ]])
1/1 [==============================] - 1s 833ms/step
1/1 [==============================] - 0s 16ms/step
1/1 [==============================] - 0s 22ms/step

...
NOTE: many similar lines
...

1/1 [==============================] - 0s 24ms/step
1/1 [==============================] - 0s 24ms/step
1/1 [==============================] - 0s 24ms/step
1/1 [==============================] - 0s 28ms/step
10:34:32.242 INFO eynollah - Textregion detection took 114.8s
1/1 [==============================] - 1s 733ms/step
10:34:35.931 INFO eynollah - Graphics detection took 3.7s
1/1 [==============================] - 1s 769ms/step
1/1 [==============================] - 0s 16ms/step
1/1 [==============================] - 0s 31ms/step

...
NOTE: many similar lines
...

1/1 [==============================] - 0s 16ms/step
1/1 [==============================] - 0s 16ms/step
1/1 [==============================] - 0s 31ms/step
1/1 [==============================] - 0s 22ms/step
10:34:59.506 INFO eynollah - textline detection took 23.6s
10:36:15.179 INFO eynollah - slope_deskew: -90.0
10:36:15.179 INFO eynollah - deskewing took 75.7s
10:36:15.332 INFO eynollah - detection of marginals took 0.2s
1/1 [==============================] - 2s 2s/step
1/1 [==============================] - 0s 16ms/step
1/1 [==============================] - 0s 16ms/step

...
NOTE: many similar lines
...

1/1 [==============================] - 0s 22ms/step
1/1 [==============================] - 0s 31ms/step
1/1 [==============================] - 1s 764ms/step
10:39:20.846 INFO eynollah - Job done in 437.0s
10:39:21.100 INFO ocrd.process.profile - Executing processor 'ocrd-eynollah-segment' took 437.728415s (wall) 813.187500s (CPU)( [--input-file-grp='OCR-D-IMG' --output-file-grp='OCR-D-IMG-SEG' --parameter='{"models": "eynollah/models_eynollah", "dpi": 360, "allow_scaling": true, "full_layout": true, "curved_line": false, "headers_off": false}' --page-id='']
10:39:21.100 INFO ocrd.workspace.save_mets - Saving mets 'D:\qurator\mets.xml'

and below is the contents of the mets.xml file...

<?xml version="1.0" encoding="UTF-8"?>
<mets:mets xmlns:mets="http://www.loc.gov/METS/" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="info:lc/xmlns/premis-v2 http://www.loc.gov/standards/premis/v2/premis-v2-0.xsd http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-6.xsd http://www.loc.gov/METS/ http://www.loc.gov/standards/mets/mets.xsd http://www.loc.gov/mix/v10 http://www.loc.gov/standards/mix/mix10/mix10.xsd">
  <mets:metsHdr CREATEDATE="2022-07-22T10:29:16.958375">
    <mets:agent TYPE="OTHER" OTHERTYPE="SOFTWARE" ROLE="CREATOR">
      <mets:name>ocrd/core v2.34.0</mets:name>
    </mets:agent>
    <mets:agent TYPE="OTHER" OTHERTYPE="SOFTWARE" ROLE="OTHER" OTHERROLE="layout/segmentation/region">
      <mets:name>ocrd-eynollah-segment v0.0.11</mets:name>
      <mets:note xmlns:ocrd="https://ocr-d.de" ocrd:option="input-file-grp">OCR-D-IMG</mets:note>
      <mets:note xmlns:ocrd="https://ocr-d.de" ocrd:option="output-file-grp">OCR-D-IMG-SEG</mets:note>
      <mets:note xmlns:ocrd="https://ocr-d.de" ocrd:option="parameter">{"models": "eynollah/models_eynollah", "dpi": 360, "allow_scaling": true, "full_layout": true, "curved_line": false, "headers_off": false}</mets:note>
      <mets:note xmlns:ocrd="https://ocr-d.de" ocrd:option="page-id"/>
    </mets:agent>
  </mets:metsHdr>
  <mets:dmdSec ID="DMDLOG_0001">
    <mets:mdWrap MDTYPE="MODS">
      <mets:xmlData>
        <mods:mods xmlns:mods="http://www.loc.gov/mods/v3">
          <mods:identifier type="purl">'test'</mods:identifier>
        </mods:mods>
      </mets:xmlData>
    </mets:mdWrap>
  </mets:dmdSec>
  <mets:amdSec ID="AMD">
    </mets:amdSec>
  <mets:fileSec>
    <mets:fileGrp USE="OCR-D-IMG">
      <mets:file ID="OCR-D-IMG_00738" MIMETYPE="image/png">
        <mets:FLocat LOCTYPE="OTHER" OTHERLOCTYPE="FILE" xlink:href="OCR-D-IMG\bqwndyazflxxtnmszruzzmlyofrxvtzc_s089_1561992285623.png"/>
      </mets:file>
    </mets:fileGrp>
    <mets:fileGrp USE="OCR-D-IMG-SEG">
      <mets:file ID="OCR-D-IMG-SEG_00738" MIMETYPE="application/vnd.prima.page+xml">
        <mets:FLocat LOCTYPE="OTHER" OTHERLOCTYPE="FILE" xlink:href="OCR-D-IMG-SEG\OCR-D-IMG-SEG_00738.xml"/>
      </mets:file>
    </mets:fileGrp>
  </mets:fileSec>
  <mets:structMap TYPE="PHYSICAL">
    <mets:div TYPE="physSequence">
      <mets:div TYPE="page" ID="P_00738">
        <mets:fptr FILEID="OCR-D-IMG_00738"/>
        <mets:fptr FILEID="OCR-D-IMG-SEG_00738"/>
      </mets:div>
    </mets:div>
  </mets:structMap>
</mets:mets>

You have already touched the point, wrong column classification cause this problem. The point is with so upper scaling the layout models are not able to detect the regions (here no text region is detected) anymore. For resolution the classifier should be updated (training with failing documents).

sjscotti commented 1 year ago

Thanks for the speedy reply.
If I understood you correctly, my image is scaled too large, so I tried to adjust in a couple of ways.

First, I tried changing the dpi flag to 1100 (-P dpi 1100) in my OCR-D eynollahrun, but is still gave the 6 columns classification result, and I got the same behavior - no textregions detected, no textlines detected.

So then, I tried scaling the image in GIMP to 720 x 918 from the original 2203 x 2808, and I reran eyenollah with the dpi flag back at my original 360 value. This also gave the classification as being 6 columns. But now it detected 21 textregions, but still no textlines.

Did I misunderstand what is causing this issue, or is there another workaround I can try?

vahidrezanezhad commented 1 year ago

Thanks for the speedy reply. If I understood you correctly, my image is scaled too large, so I tried to adjust in a couple of ways.

First, I tried changing the dpi flag to 1100 (-P dpi 1100) in my OCR-D eynollahrun, but is still gave the 6 columns classification result, and I got the same behavior - no textregions detected, no textlines detected.

So then, I tried scaling the image in GIMP to 720 x 918 from the original 2203 x 2808, and I reran eyenollah with the dpi flag back at my original 360 value. This also gave the classification as being 6 columns. But now it detected 21 textregions, but still no textlines.

Did I misunderstand what is causing this issue, or is there another workaround I can try?

Let me say you how scaling works by eynollah. If the input has a dpi less than 300 , it will be scaled based on column classifier results. But if the dpi is more than 300 or you set it manually to sth bigger that 300 the scaling will not implemented unless you activate allow_scaling. When you apply allow scaling the scaling based on detected columns (and regardless of dpi) will be taken into account. With your image (which is not a real document, it is more like a text region cropped from a document) you can scale down (600*764) and then set dpi to sth bigger than 300 (this can not be done in standalone eynollah but in OCR-D it seems that is the case) and drop allow_scaling ( so complex :D ).

unnamed_res

unnamed_res_layout