Open GoogleCodeExporter opened 9 years ago
or this application maybe :
http://www.inftyproject.org/en/software.html#InftyReader
Original comment by plutones...@gmail.com
on 4 Feb 2010 at 2:35
Detexify looks like it needs online (drawing) input and therefore won't work
with images.
Infty doesn't seem to be open source, so that won't be of use.
I am open to hosting an intern to work on this topic.
Original comment by theraysm...@gmail.com
on 20 May 2010 at 4:32
yes, you are correct, Detexify will work on splines, from the drawing. Anyway, a
first step would be to approximate images with splines and subsequently use the
detexify engine.
Concerning the current Google strategy to scan documents and put them on-line,
it
could be much more efficient to have a real light electronic version (with
vector
format fonts) instead of an heavy poorly scanned document (with raster
objects). In
this regard, developing a strategy capable of reconstructing Latex Sources from
a
scanned scientific document could be very powerful.
Original comment by plutones...@gmail.com
on 27 May 2010 at 2:13
I'm try to train Tesseract for recognizing PDF-images as Latex-Code.
I think the the line-interpretation will make it a bit complicated with
formulas that are not only one line. So as example a \frac{a}{b} could also be
an underlined text.
I hope I can find some pattern in the recognized text documents, so that can
post process them somehow.
It shouldn't be so difficult, because i work with PDFs that are latex-generated.
If you have a suggestion to my plans, please share, thanks.
Original comment by jammi.e...@gmail.com
on 20 Apr 2012 at 1:40
Hi,
My Goal: recognizing PDF-images as Latex-Code
So my input is clean and not rotated. So it seems to be an easy task. Following
that, I want to tell Tesseract that every black dot is a Symbol/Letter - there
is no noise.
Is there a easy way to do that? Or do I have to dig in the code?
Also it would be interesting, whether Tesseract recognises overlapping boxes
(in the box-file) so that a mathematical root would be recognised, but the
stuff under the root line will be recognised independently.
I will send my progress. Hope I'm right to post here.
Thanks.
Original comment by jammi.e...@gmail.com
on 9 May 2012 at 3:24
[deleted comment]
What's the strategy then? Report a bug or request a new feature?
Original comment by plutones...@gmail.com
on 10 May 2012 at 7:04
Actually I don't want to report a bug.
Maybe an issue about recognising subscript and superscript?
I want to recognise formulas and symbols that are not in UTF-8, but in Latex.
Tesseract is not build for that, but I want to improve it a bit in that
direction - by training.
To follow my last comment, I just don't know enough about the training process,
to use it wisely.
Greetings
Original comment by jammi.e...@gmail.com
on 14 May 2012 at 3:21
Attachments:
Is there any new advance with this issue? I am interested on scanning
handwriting notes with math equations and transform them to a LaTeX file.
Original comment by maikol.s...@gmail.com
on 12 Jan 2015 at 4:10
Issue 1372 has been merged into this issue.
Original comment by zde...@gmail.com
on 12 Apr 2015 at 3:06
Original issue reported on code.google.com by
plutones...@gmail.com
on 22 Dec 2009 at 6:33