Unusable at 1920x1200 and below + Ideas to fix it

Taxxian commented 6 years ago

Hello, thanks for your great tool!

I like your tool but my uncompressed 1920x1200 Steam screenshots produce so many errors that not even a single name in a random game is correct.

My workarround: Install ImageMagick and convert the screenshot: magick screen.png -filter Hamming -resize 200% screen.png That gets me to about 80% of the playernames correctly ocr´d.

What you can do: Simply read all the PlayerNames from the Jarls List and use them as a dictionary for you ocr library. That would probably lead to 99% correct identification.

What else I would want? Some more statistics, like average Jarl tank per team would be nice.

happynev commented 6 years ago

hi, thanks for the feedback! I didn't realize it was that bad at 1920x1200 (based on the ~3 screenshots i tested a long time ago) 1) ImageMagick: judging from the wiki page it looks like the tool for a task like this. I haven't thought about using an external program before. i do SOME preprocessing inside Java (you can see how that looks on the preview tab - that also shows the raw trace result without dictionary correction). i will have a look at it. 2) I'm kinda ashamed that i didn't think of that myself ;-) But I'll have to talk to Scurro if there's a better way to get the player names than grabbing ~1700 pages. I never planned to import ALL the data. 3) That one i actually thought of before ;-) But i'm pissed at my own data and processing model, because they way the Leaderboard stat is set up i'm unable to calculate an average. i'll have to rework that completely (probably together with a new importer for the leaderboard data)

but sadly i can't put a lot of time into this project (or MWO for that matter), so progress will be slow :-/

Taxxian commented 6 years ago

Hello, I got a Jarls List Dump from Scurro, made a wordlist and put in in but it does not seem like he is using it :-( (I edited the api_config but OCR does not improve) Since the wordlist is one word a line and many playernames have whitespaces I am not absolutely convinced it will work anyways

Some ideas to get it working quickly: (I have virtually no experience with Java, so doing it would take me a hell lot longer than you^^)

Take the 162k playernames + ranks + average matchscrores and put them in a table in your H2 Database. (this import can even be one time, there are not that many new players each day...)
check every tesseracted name against this List
maybe you simply calculate the hamming distance between the strings and than substract something for usual OCR errors like 0->o or missing/added whitespaces (players like "O C K E" and "J A Y" will allways be OCRd without whitespaces because they contain no example for distance between letters without whitespace....

Should you think you wont have any time for this in the next days please inform me I will try to hack something myself than (but it wont be good^^)

Thanks a lot for all your work

Taxxian

happynev commented 6 years ago

hi i never got it to work via api_config, so all whitelists are done at runtime (although only character lists for the tracer, wordlist is done manually after tracing). in the beginning i tried "training" tesseract, but soon gave up on it :-/

while i'm glad to see you're enthusiastic, this is also the reason why i never actually made it public. It's more of a "when i'm in the mood" project, and ATM i am not :-(

happynev commented 6 years ago

new update! so in regard to the original points: 1) i reworked the image preprocessing quite a bit. hopefully for the better :-) (while searching on google for some testable HD screenshots i found quite a lot that have been contrast-enhanced or something. screws up the color filters->screws up the OCR)

2)i thought about it some more and i can not find a solution how to distinguish between "new" players and OCR fails. Just changing it to the "next possible" name is just wrong.

3)Found a better way to integrate the Jarl's data :-) this enables all calculations on that and therefore lots of new stats :-) including average/median rank and rank per team. if i missed some useful ones, tell me.

don't forget to enable the new stats on the settings tab :-)

I'd like to hear back if this improved the tracing on you screenshots (if not could you please upload the screenshots somewhere, so i have something concrete to test)

happynev / MwoScoreboardHelper

Unusable at 1920x1200 and below + Ideas to fix it #1