otiai10 / gosseract

Go package for OCR (Optical Character Recognition), by using Tesseract C++ library
https://pkg.go.dev/github.com/otiai10/gosseract
MIT License
2.72k stars 288 forks source link

Fails to OCR file that CLI Tesseract handles perfectly #285

Open noahsilverman opened 1 year ago

noahsilverman commented 1 year ago

This text is generated based on ISSUE_TEMPLATE.md. The issue reporter must read and remove this block before submitting.

Summary

Reproducibility

tesseract --psm 13 img.jpg -
1234567
package main

import (
    "fmt"
    "github.com/otiai10/gosseract/v2
)

func main() {
    client := gosseract.NewClient()
    defer client.Close()
    err := client.SetLanguage("eng")
    if err != nil {
        fmt.Println(err)
    }

    err = client.SetPageSegMode(13)
    if err != nil {
        fmt.Println(err)
    }

    err = client.SetImage("img.jpg")
    if err != nil {
        fmt.Println(err)
    }

    text, err := client.Text()
    if err != nil {
        fmt.Println(err)
    }

    fmt.Println(text)
}

Environment

uname -a
Darwin MacBook-Pro.local 22.5.0 Darwin Kernel Version 22.5.0
go version
go version go1.20.7 darwin/arm64
tesseract --version
tesseract 5.3.2
 leptonica-1.82.0
  libgif 5.2.1 : libjpeg 8d (libjpeg-turbo 2.1.5.1) : libpng 1.6.40 : libtiff 4.5.1 : zlib 1.2.11 : libwebp 1.3.1 : libopenjp2 2.5.0
 Found NEON
 Found libarchive 3.6.2 zlib/1.2.11 liblzma/5.4.1 bz2lib/1.0.8 liblz4/1.9.4 libzstd/1.5.4
 Found libcurl/7.88.1 SecureTransport (LibreSSL/3.3.6) zlib/1.2.11 nghttp2/1.51.0
otiai10 commented 1 year ago

@noahsilverman Is it possible to share the image file with me?

rubiojr commented 8 months ago

Faced a similar problem too. Turned out to be image orientation in my case. Rotating the image fixed the issue for me. Is the tesseract CLI able to detect orientation automatically and auto-rotate the image for us?