Closed GoogleCodeExporter closed 9 years ago
successfully generated output similar to musicsymols.png - vide attached files.
Tested under version 3.02 in Winxp(sp3). traineddata also attached.
-sriranga(79yrs)
Original comment by withbles...@gmail.com
on 26 Feb 2012 at 2:26
Attachments:
I have a similar need, when processing text lines from within Audiveris OMR
program.
Audiveris is able to process the staves and music symbols but delegates to
Tesseract the transcription of text lines. Unfortunately, some text lines
happen to contain a music character, for example (see the 2 attached files
also):
- A tempo indication is often written as (J = 69), where 'J' should be the
quarter sign
- A guitar chord name with a flat alteration, like Abm for A flat minor, where
'b' should actually be the flat sign.
We have just switched from Tesseract V2 to V3.02. How could we use the training
features of Tesseract to recognize these musical symbols (a very limited
number: quarter, eighth, flat, nothing more).
Could withblessings@gmail.com give us a hand based on his example?
Thanks
/Hervé (owner of open source Audiveris)
Original comment by herve.bi...@gmail.com
on 16 Jun 2012 at 3:39
Attachments:
herve,
please visit site http://unicode.org/charts/PDF/U1D100.pdf . It appears that
music fonts as per unicode chart are not available. For training purpose
music fonts just like English fonts are required. If you are able to furnish
music fonts, I shall try to generate trainedata file for the fonts supplied by
you.
sriranga(79yrs)
Original comment by withbles...@gmail.com
on 17 Jun 2012 at 6:29
I recently discovered Musica, a free music font which complies with unicode
values.
I'm using it to train Tesseract on musical symbols (work still in progress)
See http://users.teilar.gr/~g1951d/ and click on musica link
/Hervé
Original comment by herve.bi...@gmail.com
on 26 Jun 2012 at 7:18
IMO the music symbol are not very common.
I would suggest to create custom "language" as Google did for mathematical
symbols (see equ package) for tesseract-ocr 3.02. In 3.02 version brought
simultaneous multi-language capability, so you can run something like this:
tesseract andantino.png andantino -l eng+music
if you create music.traineddata
Original comment by zde...@gmail.com
on 5 Nov 2012 at 9:43
OK, it sounds like a solution with eng+music :)
Original comment by nikse.dk@gmail.com
on 6 Nov 2012 at 6:06
Original comment by zde...@gmail.com
on 15 Nov 2012 at 8:46
sriranga... I don't suppose you still have the files used in the
"combine_tessdata" commando?
Original comment by nikse.dk@gmail.com
on 7 Feb 2013 at 6:04
sorry.Since i have deleted all stored in particular drive to make space
for other items.
However I have uploaded the traineddata file under comment no:1 which can
be used.
Original comment by withbles...@gmail.com
on 8 Feb 2013 at 7:01
OK, thx for the info.
I managed to get it working a bit. Now I just need a few more fonts and also
italic. I'm getting there, but training Tesseract is not super easy...
Original comment by nikse.dk@gmail.com
on 8 Feb 2013 at 7:31
Original issue reported on code.google.com by
nikse.dk@gmail.com
on 20 Jan 2012 at 11:15Attachments: