tesseract-ocr / langdata

Source training data for Tesseract for lots of languages
Apache License 2.0
837 stars 889 forks source link

Failed to initialise tesseract engine: .net 6.0 [Tesseract 4.1.1 + Tesseract.Data.English 4.0.0] #295

Closed J35P1N closed 2 years ago

J35P1N commented 2 years ago

Hi, I've recently been looking at Tesseract as a solution for providing OCR for driving licences however, I cannot seem to get the .Net implementations to play ball with what should be the matching learning data for the english language. My implmentation at the moment is super simple and i've created a new .Net 6.0 api project with the following controller code:

[Route("api/[controller]")]
public class OcrController : ControllerBase
{
    public const string folderName = "images/";
    public const string trainedDataFolderName = "tessdata";

    [HttpPost]
    public string DoOCR([FromForm] OcrModel request)
    {

        string name = request.Image.FileName;
        var image = request.Image;

        if (image.Length > 0)
        {
            using (var fileStream = new FileStream(folderName + image.FileName, FileMode.Create))
            {
                image.CopyTo(fileStream);
            }
        }

        string result = "";

        using (var engine = new TesseractEngine(@"./tessdata", request.DestinationLanguage, EngineMode.TesseractAndLstm))
        {
            using (var img = Pix.LoadFromFile(folderName + name))
            {
                var page = engine.Process(img);
                result = page.GetText();
                Console.WriteLine(result);
            }
        }
        return String.IsNullOrWhiteSpace(result) ? "Ocr is finished. Return empty" : result;
    }

I've installed the following Nuget packages into the solution:

image

But when it comes to actually running the solution in Swagger or using Postman, as soon as it hits the TesseractEngine initialiser I get a "Failed to initialise tesseract engine.. See https://github.com/charlesw/tesseract/wiki/Error-1 for details." error. I've tried to use this link to diagnose the problem but to be honest it mostly suggests that there may be version differences between the language file and the tesseract version which in this case, both are using the 4.0.x versions so should be compatible.

I've checked the following:

Any assistance with this issue would be greatly appreciated.

FriedrichFroebel commented 2 years ago

How is this related to this repository? This repository contains the data for training the models. Actual engine issues should be reported at the appropriate places - either at the base repository if it corresponds to a general issue or to the wrapper library you use (in your case the wrapper library seems to live at https://github.com/charlesw/tesseract).

stweil commented 2 years ago

Please use the Tesseract user forum for questions.