Open myrmoteras opened 5 years ago
Running specimen count tagger and the clicking through the manual parser with "OK & Next" should make it right ... please run the batch with FM=U
to prevent distorting male symbols.
I run it this way
java -jar -Xmx10240m GgImagineBatch.jar "DATA=E:\diglib\europeanJournalOfTaxonomy\backIssueToBeProcessed\2012" CACHE=./BatchCache FM=U
Ah, OK, then it should work ... unless the male symbol comes without a Unicode mapping. I'll try and think up a solution ... please stop opening tickets about this very font issue, we have half a dozen by now.
Something that might be worth a try is to extract all the rendered glyphs (or at least their 32x32 renderings) from from all our IMFs and compile a wider basis for comparison ... I'll try and figure out a way of doing this, maybe also for other characters.
Here the whole problem finally explained with some visualization ... just look at how different the symbols are, and this is only the ones that have at least been recognized correctly once, or corrected after decoding:
Mind the different
All in all just too much variation to handle with a single comparison font glyph, and thus the repeating errors.
The image is compiled from screenshots of my little tool aimed at what I sketched in my previous comment ... meaning to say: remedy is underway.
PS: the one somewhat strange symbol not highlighted in green is an accident from our early days of PDF decoding, when we were still developing the renderers for embedded fonts and had some bugs in that department.
if I run the font and change it, the MC have the wrong specimenCount. is there a way to fix this?