tesseract-ocr / tesseract

Tesseract Open Source OCR Engine (main repository)
https://tesseract-ocr.github.io/
Apache License 2.0
61.56k stars 9.44k forks source link

tesseract extracting text as html format #3430

Closed nabilalakhani closed 3 years ago

nabilalakhani commented 3 years ago

hi i am noob using tesseract with java i tried extracting text from a .png and .jpg file but its giving output in html format below is output

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">

File

Warning: Invalid resolution 0 dpi. Using 70 instead.

zdenop commented 3 years ago

Please respect guidelines for posting issue: use tesseract user forum for asking questions/support.