Open jpcima opened 4 years ago
random set of some vgmusic's which misdetect:
beachcave.mid.gz ff1flcst.mid.gz Mi%27Ihen_Highway.mid.gz realemotion1.mid.gz so2_hurry.mid.gz
Idea of algorithm for new heuristic
Let S be an input string of length N
Score ← 0
Counter ← 0
Script ← None
For each codepoint C of S:
PreviousScript ← Script
Script ← uscript_getScript(C)
If Script ≠ PreviousScript:
Counter ← 0
Counter ← Counter + 1
Score ← Score + Counter
Score ← Score / N
I think, the encoding detection issues are vastly resolved, but I'll drop here some samples which are still failing.
z2ow.mid.gzCP932