Closed ericduarte closed 6 years ago
I assume you've downloaded and provided path to language data and specified it in call to Tesseract.Initialize
, default is English. If trained data doesn't match language that you want to OCR, PDF file will be created but there will be no text to search in it.
Please try OCR your image with examples\delphi-console-simple
example, and post your results (is text returned in console).
Thanks for your attention,
the examples\delphi-console-simple
example works fine but examples\delphi-console-pdfconvert
does not.
if Tesseract.Initialize('tessdata\', 'eng') then
begin
inputFileName := 'samples\multi-page.tif';
outputFileName := 'multi-page.pdf';
if Tesseract.CreatePDF(inputFileName, outputFileName) then
begin
WriteLn('PDF was saved succesfully to ' + outputFileName);
ReadLn;
end;
end;
Please attach input image and output PDF. PS. Don't copy paste example source code but include actual code (if needed).
I'm using Delphi 10 Seatle, and did a litle change in tesseractocr.consts.pas, included it
{$IFDEF VER300} type PUTF8Char = PAnsiChar; {$ENDIF}
and changed it
{$IFDEF Use_CPPAN_Binaries} libleptonica = {$IFDEF Linux}'libpvt.cppan.demo.danbloomberg.leptonica-1.74.4.so'{$ELSE}'liblept-5.dll'{$ENDIF}; libtesseract = {$IFDEF Linux}'libpvt.cppan.demo.google.tesseract.libtesseract-master.so'{$ELSE}'libtesseract-4.dll'{$ENDIF}; {$ELSE}
I compressed the image to attach
I compared multi-page.pdf
that I'm getting with yours, and can say with no doubt the issue is in Tesseract. OCR export to PDF is still under development of Tesseract, latest branch will even crash while trying to save to PDF file. I've made a copy of build dated 07-08-2017, this one seems to create searchable PDF files:
Thanks for finding this issue. I will monitor Tesseract development and update precompiled binaries on my server, once issue is fixed in the Tesseract.
Worked
Thanks for your help.
Hello
I've tried to export image to PDF, it generates the pdf, but the text is not searchable.