myjian / mai-tools

mai-tools is a collection of useful tools for maimai and maimai DX.
https://myjian.github.io/mai-tools
GNU General Public License v3.0
39 stars 11 forks source link

Update mai-tools rating database fetching with respect to upstream changes #83

Open evnchn opened 8 months ago

evnchn commented 8 months ago
image

Can't effectively calculate ratings anymore without the internal level data.

Please check https://sgimera.github.io/mai_RatingAnalyzer/maidx_inner_level_festivalplus.html

Instead of loading, traditionally, maidx_in_lv_[ver].js, now it loads maidx_in_lv_data_[ver].js. Entirely different format

I note that you have used your private copy of maidx_in_lv_[ver].js for festival and universeplus. It may not be a great idea as the constants may change, so this may be a good time to fix both issues at once.

Meanwhile, I am working on a hotfix that involves converting the new format to the old format for your script to use. It will take some time, though. If I have an update, I will pull-request with the modified URLs.

evnchn commented 8 months ago

Hotfix created.

Check https://github.com/evnchn/fix-mai-internal-lv

evnchn commented 8 months ago

Now all that is needed is to test and then use the internal level data from the hotfix.

evnchn commented 8 months ago

Would add that it may be a better idea to pre-generate the "magicVer20" on a server and then simply set the string in localStorage, as in what the userscript hotfix does.

myjian commented 8 months ago

Similar issue happened several times in the past and it was all because sgimera deleted the maidx_in_lv_[ver].js from the repo.

The data file powering the page is separate from the data file powering rating calculation script. I don't know why he chose to do this (using separate files for html page and tooling), but it has been the case since ~3 years ago.

The format of maidx_in_lv_[ver].js did not change much from version to version. Take the latest version, maidx_in_lv_buddies.js for example. It still has dx, v, lv, n, nn, ico fields. The only changes I noticed was changing the quote from using " (double quotation mark) to ` (backtick), and I added support for backtick-quoted strings in c0fc3df3e15bca80ca547a6399046a1a5aacfaeb and 4984954d06b1d6e62c8c3da0f4ce756086eadaa9.

I've been avoiding having a server because of increased complexity of testing, deployment and potential hosting cost (Google Cloud, AWS, Azure, etc.). Surely there is no need to have every user compute magicVerN from raw data, but it is likely not a performance bottleneck (loading scores usually takes longer because we don't want to be rate limited by maimai net). On the other hand, I agree missing data from external source brings correctness issue and it may make sense to use expired cache data if external source is failing. I created #85 for keeping cache in such cases.

I've been avoiding maintaining chart level data because I am not actively investing time in upper level charts. I also don't know whether there are other sources of chart level data that I can load via a fetch call. The copies are made private to prevent search engine indexing and I'll happily accept any contributions. Usually, when I copy a chart level data to Gist, the data is relatively completed by Japan players. I believe not much update is required after I make the copy (I may be wrong because I don't participate in communities much).

myjian commented 8 months ago

Thank you for reporting FESTiVAL PLUS rating calculation being broken. e8a4340b50b74937c6a6f50ce7d07ff8f42ee47a fixed it (by referring to my copy of magic)

Also thank you for starting the conversation. There has been a couple architectural decisions I took but didn't explain it to anyone. I hope writing it down makes it more understandable.

evnchn commented 8 months ago

Let me put my thoughts on the couple of ways of doing it:

1: refering to anything made by sgimera other than the maidx_in_lv_data_[ver].js (which is required for his/her own page to function) may not be the best idea.

Put simply, no one is obliged to maintain some data that they don't use. I would advise against questioning his/her achitectural decisions just because it broke your code.

2: referring to your own copy of maidx_in_lv_[ver].js may not be the best idea.

Not only does that exclude the possibility of updates, but actually from writing the hotfix, I discovered that I can blend the data between versions and potentiall get more accutate results. I do not remember the exact occurances, but a few charts are -12.0 (unsure Lv12) in festivalplus database but 12.0 - 12.6 (definite known internal level) in buddies. Rather than underestimating the charts internal lv, it may be better to use buddies data, provided that the new internal level lands within the old unknown level range.

3: hosting a server may not be a bad idea after all

If there's a fallback, sketchy servers with potentially low reliabilty is a-ok. For instance, mai-speed (image loading boosting extension) relies on a Proxmox instance on just an i3-3110M (yep mobile CPU on a desktop motherboard) running tens of containers with delightfully slow 8GB of DDR3 laptop RAM, all served over internet and power which may go out with no former notice. However, the script falls back to to loading images from SEGA's servers, so it's alright.

On the cost side, should be relatively inexpensive. You can use mine if you want. I'll just write a script to periodically git pull and restart script (during the 3-hour maintainence window I persume) if your repo is ready.

If you would like to write your own processing code to pre-generate magicVer20 string, or even to use my hotfix as a starting ground to digest maidx_in_lv_data_[ver].js, by all means feel free do it. You got my support in terms of previously written code, compute, and perhaps mental support when there's bugs and you feel desperate (we all do sometimes). I may even try as well, if I have time. The ideal outcome is an API endpoint that provides freshly computed magicVer20 strings (safe provided that mai-speed do failover if I go down).

I think I shall leave this issue open as it seems like you are using issues as a to-do list.

evnchn commented 8 months ago

Adding that using the blended data between festivalplus and buddies from my hotfix has ended with -60 rating, when the original rating was an underestimate already.

But it could be caused by my script not validating that the new internal level lands within the old unknown level range. The rating going even more down is likely caused by the new internal level lands outside of the old unknown level range. (consider originally unknown lv12 and got dropped to lv11.5 in buddies)

evnchn commented 8 months ago

Final note: I'll add that https://github.com/evnchn/fix-mai-internal-lv/commit/d5932dc3b5b7c2f890ee3630421f112e2dff8acb you can see there's a bug that throws all the DX exact level data away for my fixer.py. Whether blending internal level data is a good idea remains to be seen, but decoupling the need of maidx_in_lv_[ver].js sure is a good-to-have.

myjian commented 8 months ago

1: refering to anything made by sgimera other than the maidx_in_lvdata[ver].js (which is required for his/her own page to function) may not be the best idea.

Sgimera has his tools that utilize maidx_inlv[ver].js to calculate rating. In my previous reply I should have emphasized that the files were for his tools and pages.

Put simply, no one is obliged to maintain some data that they don't use. I would advise against questioning his/her achitectural decisions just because it broke your code.

I questioned it because it doesn't feel efficient, not because it broke my code. I have the right to complain but I will never hold him accountable for breaking my code.

I didn't know 3 years ago that the _data files will stay while the others don't. I chose the JSON like format because it was easier to parse and song titles matched official titles.

2: referring to your own copy of maidx_inlv[ver].js may not be the best idea.

Re: excluding the possibility of getting updated, it is the same case with sgimera's data, right? When someone wants to contribute chart constants, they need to reach out to who possess the data. It could be sgimera, me, zetaraku, etc.

I was kind of being forced to host my own copy because I didn't have a better solution. If you're providing an endpoint that I can rely on, I'll happily switch the URLs.

Re: blending constants, I agree it can be helpful but it feels a separate topic to me.

3: hosting a server may not be a bad idea after all

That's right. Major reason is lack of motivation. I would not spend money to host a server that is not necessarily needed, for a game I don't play actively.

evnchn commented 8 months ago

Interesting.

I guess since I do not know typescript, that contributing to a more accurate and usable format for the rating data is my best action to take to make mai-tools significantly better. Casually writing userscript extensions will likely just benefit me and my friends.

I am thinking something that is, y'know, actually JSON, so you can just JSON.parse directly. Just like making location.getItem("magicVer20") string.

While we're at it, maybe with additional features such as estimated lower and upper bound for the rating estimation. Like if I have a chart which is 12+ uncertain in festivalplus, and 13.0 in buddies (likely such adjustment just done so that this hard chart don't show up in someone's random by level, think random 段位認定), we may take the chart constant to be 12.9 rather than 12.7 (default for 12+ uncertain)

Also just to clarify that I do not intend to be judgemental for the previous reply. But I will add that indeed using JavaScript rather than JSON for handling data is indeed inefficient, potentially unreliable, and considered unsafe as well. By all means the JSON parser is faster than the JS parser since JS syntax is much more complex, and it can expose a potential security vulnerability if the JS content gets overridden with some malicious content.

Now mai-tools use regex to parse JS which, luckily, did not break so far, but really could at any moment if there's special format. Would not recommend to use JS eval, though.

More info on using JavaScript to handle tossing data around and the security implications: https://en.wikipedia.org/wiki/JSONP