Open GoogleCodeExporter opened 9 years ago
Please see Comment 6 over here for further information:
http://bugs.ghostscript.com/show_bug.cgi?id=695869
Original comment by brian....@gmail.com
on 16 Mar 2015 at 2:14
I am the author of this feature, and did much of my testing with evince. This
is known behaviour. The font itself is doubly invisible; it contains no glyphs
and it is drawn with "invisible ink". Evince is inverting the invisible font
and drawing a solid bar. Depending on your setup this will either be solid
black or perhaps solid orange. As you mentioned, other viewers including Adobe
Reader and the PDF viewer built into Chrome give good results.
I view this as a deficiency in evince for handling invisible fonts overload on
a image (which is a natural representation for OCR results). I do not believe
this is a problem with the PDF itself, or the program that it is generating it
(Tesseract).
Recommend filing a feature request with evince.
Original comment by breidenb...@gmail.com
on 20 Mar 2015 at 9:51
I now see a lot of complaints about the embedded font on the Ghostscipt bug, so
am switching my attention over to there.
Original comment by breidenb...@gmail.com
on 20 Mar 2015 at 9:54
I've been reading along with the discussion over on the Ghostscript bug. While
most of it is way over my head, I take it that it could be a while before this
is resolved.
I wonder, would it be trivial to fix this issue in a temporary fork of
Tesseract without support for non-Latin characters? If so, I would definitely
be interested in using such a version in the meantime.
Original comment by brian....@gmail.com
on 26 Mar 2015 at 6:35
Ray committed some code yesterday that seems to deal with this.
Original comment by joregan
on 13 May 2015 at 11:44
Okay, so this update has nothing to do with Evince and
highlighting.
There was a compatibility problem with ghostscipt, though.
This is resolved in the current source tree. Credit
goes to Ken Sharp. He designed a new invisible font that
removes this compatibility problem. I lost my password to
the ghostscript bug tracking system so I cannot report the
problem resolved there. Read all about it here.
https://code.google.com/p/tesseract-ocr/source/browse/api/pdfrenderer.cpp#19
PS. I vaguely remember that Ken said Ghostscript still has some issues
with certain documents, but that the Tesseract PDF files are now 100% valid
as far as he is concerned. So if there is any work left, it is on his side.
Original comment by breidenb...@gmail.com
on 10 Jun 2015 at 6:29
Original issue reported on code.google.com by
brian....@gmail.com
on 16 Mar 2015 at 7:22Attachments: