gsautter / goldengate-imagine

Automatically exported from code.google.com/p/goldengate-imagine
Other
1 stars 0 forks source link

font issue zootaxa.4668.1.2 : male symbol wrong, number of of specimennumbers wrong: how to correct? #766

Open myrmoteras opened 5 years ago

myrmoteras commented 5 years ago

image

if I run the font and change it, the MC have the wrong specimenCount. is there a way to fix this?

gsautter commented 5 years ago

Running specimen count tagger and the clicking through the manual parser with "OK & Next" should make it right ... please run the batch with FM=U to prevent distorting male symbols.

myrmoteras commented 5 years ago

I run it this way

java -jar -Xmx10240m GgImagineBatch.jar "DATA=E:\diglib\europeanJournalOfTaxonomy\backIssueToBeProcessed\2012" CACHE=./BatchCache FM=U

gsautter commented 5 years ago

Ah, OK, then it should work ... unless the male symbol comes without a Unicode mapping. I'll try and think up a solution ... please stop opening tickets about this very font issue, we have half a dozen by now.

gsautter commented 5 years ago

Something that might be worth a try is to extract all the rendered glyphs (or at least their 32x32 renderings) from from all our IMFs and compile a wider basis for comparison ... I'll try and figure out a way of doing this, maybe also for other characters.

gsautter commented 5 years ago

Here the whole problem finally explained with some visualization ... just look at how different the symbols are, and this is only the ones that have at least been recognized correctly once, or corrected after decoding:

image

Mind the different

All in all just too much variation to handle with a single comparison font glyph, and thus the repeating errors.

The image is compiled from screenshots of my little tool aimed at what I sketched in my previous comment ... meaning to say: remedy is underway.

gsautter commented 5 years ago

PS: the one somewhat strange symbol not highlighted in green is an accident from our early days of PDF decoding, when we were still developing the renderers for embedded fonts and had some bugs in that department.