naptha / tesseract.js

Pure Javascript OCR for more than 100 Languages 📖🎉🖥
http://tesseract.projectnaptha.com/
Apache License 2.0
34.91k stars 2.21k forks source link

rotateRadians and osd output are incorrect #859

Closed wvanrensburg closed 2 months ago

wvanrensburg commented 9 months ago

Version 5.03

When asking for rotateRadians from the recognize api, the radians returned are incorrect. See attached image for example. The correct text is pulled, and the image is clearly 90 degrees (or 270, or -90 however you put it), but the radians value returned is -0.010256410576403141, which converting back to degrees, comes out to roughly -0.58 degrees.

Tested the below with both PSM modes of PSM.SPARSE_TEXT_OSD and PSM.AUTO_OSD

Example to reproduce

const { data: { tsv, imageBinary, rotateRadians } } = await scheduler.addJob(
           'recognize',
          `data:image/png;base64,${imagebase64}`,
          {rotateAuto: true}, {imageBinary: true, rotateRadians: true, tsv: true}
);

When calling it directly on tesseract, Tesseract reports correct orientation

tesseract /path/to/image.png /path/to/reportout --psm 0

Page number: 0
Orientation in degrees: 90
Rotate: 270
Orientation confidence: 10.40
Script: Latin
Script confidence: 5.40

rotatedimage

wvanrensburg commented 9 months ago

Adding to this.... when adding the osd: true to the options payload, the OSD result comes out incorrect as well running with modes PSM.SPARSE_TEXT_OSD and PSM.AUTO_OSD

Example

const { data: { tsv, imageBinary, rotateRadians } } = await scheduler.addJob(
           'recognize',
          `data:image/png;base64,${imagebase64}`,
          {rotateAuto: true}, {imageBinary: true, rotateRadians: true, tsv: true, osd: true}
);

Results in..

Page number: 0
Orientation in degrees: 0
Rotate: 0
Orientation confidence: 0.00
Script: Latin
Script confidence: 2.00

When running in PSM.OSD_ONLY mode, the results work...

Page number: 0
Orientation in degrees: 90
Rotate: 270
Orientation confidence: 10.66
Script: Latin
Script confidence: 4.33

UPDATE When running in PSM.OSD_ONLY mode, through the scheduler and jobs, Im getting 0 degree orientation. Does not work through the scheduler and jobs, works through direct call

Balearica commented 9 months ago

Thanks for reporting. It sounds like we need to improve documentation and/or recognition output for auto-rotate and orientation detection, and perhaps adding an output field for total page angle.

The reason why rotateRadians is reported as -0.58 degrees is that rotateRadians is only reporting the angle used by the auto-rotate feature (enabled by rotateAuto: true). The auto-rotate code is distinct from the orientation detection code, with both adjustments being calculated independently at different points in the recognition process.

Orientation detection detects how the page is oriented, with 4 discrete options: 0/90/180/270. The angle used by auto-rotate is calculated using the slope of the lines of text Tesseract identifies just prior to recognition (after orientation has been corrected). Auto-rotate is intended to adjust the image +/- 10 degrees and improves results for pages that have been photographed/scanned at an angle. To calculate the total angle of the page, as it stands, both the angle reported by orientation detection and auto-rotate would need to be combined.

Balearica commented 9 months ago

Adding to this.... when adding the osd: true to the options payload, the OSD result comes out incorrect as well running with modes PSM.SPARSE_TEXT_OSD and PSM.AUTO_OSD

I was able to replicate the incorrect osd results you are describing when using the Tesseract.js default oem value of 1 (LSTM_ONLY). However, changing this value to 2 (TESSERACT_LSTM_COMBINED) produce the correct result. oem is the 2nd argument in createWorker, so that looks like the following:

const worker = await Tesseract.createWorker("eng", 2);

I believe that the tesseract CLI application uses a default oem value of 2. Therefore, this would explain why you got different results using Tesseract.js compared to the tesseract CLI program.

Balearica commented 9 months ago

The following snippet contains a minimal test site that reports the total angle of the page, including both orientation and page rotation.

<!DOCTYPE HTML>
<html>
  <head>
    <script src="https://cdn.jsdelivr.net/npm/tesseract.js@5/dist/tesseract.min.js"></script>
  </head>
  <body>
    <input type="file" id="uploader" multiple>
    <script type="module">

      const worker = await Tesseract.createWorker("eng", 2);
      worker.setParameters({tessedit_pageseg_mode: '3'})

      const recognize = async function(evt){
        const files = evt.target.files;

        for (let i=0; i<files.length; i++) {
          const ret = await worker.recognize(files[i], {rotateAuto: true}, {osd: true});

          const osdAngle = parseFloat(ret.data.osd.match(/Orientation in degrees: (\d+)/)?.[1]) || 0;
          const autoRotateAngle = ret.data.rotateRadians * (180 / Math.PI) * -1;
          const totalAngle = osdAngle + autoRotateAngle;
          console.log("osdAngle: " + osdAngle + " (degrees)");
          console.log("autoRotateAngle: " + autoRotateAngle + " (degrees)");
          console.log("totalAngle: " + totalAngle + " (degrees)");

          console.log(ret.data.text);
        }
      }
      const elm = document.getElementById('uploader');
      elm.addEventListener('change', recognize);
    </script>
  </body>
</html>

This test image is rotated exactly 95 degrees clockwise.

rotate_95_clock

Results:

osdAngle: 90 (degrees)
autoRotateAngle: 5.017756638841202 (degrees)
totalAngle: 95.0177566388412 (degrees)
matsklevstad commented 2 months ago

Hi! I tried to use your code @Balearica in a plain .html file and run it in the browser, but i get this error. Any suggestions why?

Screenshot 2024-07-26 at 08 33 04

Balearica commented 2 months ago

@matsklevstad Modify the createWorker line to the following and it should run.

const worker = await Tesseract.createWorker("eng", 2, {legacyCore: true, legacyLang: true});

This forces the Legacy code and language data to be downloaded, which is required for oem mode 2 (LSTM + Legacy fallback). This should happen automatically, so the fact that this is not happening is a bug. I will open a separate issue to track this.

matsklevstad commented 2 months ago

Thank you @Balearica, it worked! I tried reading the documentation, but I couldn't find what I was looking for. Is it possible to calculate the angle and then rotate the image so that OCR can read it correctly? Specifically, if the image is upside down, can the system detect this, rotate the image, and then perform OCR?

For example, the total angle is correctly reported to be 270 degrees, but the output does not make sense. Is there a way to force the OCR to rotate the image to 0 degrees and then perform OCR?

Screenshot 2024-07-29 at 08 47 33

Balearica commented 2 months ago

@matsklevstad Tesseract.js should be capable of detecting and automatically accounting for various orientations and angles when performing recognition. However, the scope of this Git Issue is specific to Tesseract.js reporting correct angle information, which it looks like is not at issue here. Therefore, if you are having general issues with orientation detection, please create a new Git Discussion with a reproducible example (code + image) of the image rotated at 270 degrees that you are working with, and I can take a look.