Open Taxxian opened 6 years ago
hi, thanks for the feedback! I didn't realize it was that bad at 1920x1200 (based on the ~3 screenshots i tested a long time ago) 1) ImageMagick: judging from the wiki page it looks like the tool for a task like this. I haven't thought about using an external program before. i do SOME preprocessing inside Java (you can see how that looks on the preview tab - that also shows the raw trace result without dictionary correction). i will have a look at it. 2) I'm kinda ashamed that i didn't think of that myself ;-) But I'll have to talk to Scurro if there's a better way to get the player names than grabbing ~1700 pages. I never planned to import ALL the data. 3) That one i actually thought of before ;-) But i'm pissed at my own data and processing model, because they way the Leaderboard stat is set up i'm unable to calculate an average. i'll have to rework that completely (probably together with a new importer for the leaderboard data)
but sadly i can't put a lot of time into this project (or MWO for that matter), so progress will be slow :-/
Hello, I got a Jarls List Dump from Scurro, made a wordlist and put in in but it does not seem like he is using it :-( (I edited the api_config but OCR does not improve) Since the wordlist is one word a line and many playernames have whitespaces I am not absolutely convinced it will work anyways
Some ideas to get it working quickly: (I have virtually no experience with Java, so doing it would take me a hell lot longer than you^^)
Should you think you wont have any time for this in the next days please inform me I will try to hack something myself than (but it wont be good^^)
Thanks a lot for all your work
Taxxian
hi i never got it to work via api_config, so all whitelists are done at runtime (although only character lists for the tracer, wordlist is done manually after tracing). in the beginning i tried "training" tesseract, but soon gave up on it :-/
while i'm glad to see you're enthusiastic, this is also the reason why i never actually made it public. It's more of a "when i'm in the mood" project, and ATM i am not :-(
new update! so in regard to the original points: 1) i reworked the image preprocessing quite a bit. hopefully for the better :-) (while searching on google for some testable HD screenshots i found quite a lot that have been contrast-enhanced or something. screws up the color filters->screws up the OCR)
2)i thought about it some more and i can not find a solution how to distinguish between "new" players and OCR fails. Just changing it to the "next possible" name is just wrong.
3)Found a better way to integrate the Jarl's data :-) this enables all calculations on that and therefore lots of new stats :-) including average/median rank and rank per team. if i missed some useful ones, tell me.
don't forget to enable the new stats on the settings tab :-)
I'd like to hear back if this improved the tracing on you screenshots (if not could you please upload the screenshots somewhere, so i have something concrete to test)
Hello, thanks for your great tool!
I like your tool but my uncompressed 1920x1200 Steam screenshots produce so many errors that not even a single name in a random game is correct.
My workarround: Install ImageMagick and convert the screenshot: magick screen.png -filter Hamming -resize 200% screen.png That gets me to about 80% of the playernames correctly ocr´d.
What you can do: Simply read all the PlayerNames from the Jarls List and use them as a dictionary for you ocr library. That would probably lead to 99% correct identification.
What else I would want? Some more statistics, like average Jarl tank per team would be nice.