supertuxkart / stk-stats

Tool to record and display statistics about SuperTuxKart clients
MIT License
9 stars 9 forks source link

Performance improvement. Process raw JSON data incrementally. #7

Open vampy opened 9 years ago

vampy commented 9 years ago

Currently every time the maint_graphics.py generates the report data, it removes all the old data and starts to process all the new data from scratch. This is quite inefficient as it takes quite a lot of CPU power and memory (especially on low ram machines, cough digitalocean VPS).

A better solution would be:

  1. Every time we run the generation script we will keep track of the last ID in the userreport table that was affected.
  2. Every time the generation script is run it will only process data from the previous remembered ID to the current latest ID.
  3. Handle user count for each device by looking up in the table to see if the device is already there https://github.com/supertuxkart/stk-stats/blob/master/userreport%2Fmaint.py#L48. Maybe another column which has the SHA1 sum of all the columns that identifies the device, so that we do not have to add non-clustered keys on all columns in the graphicsdevice table. But I am not sure which is faster, one column SHA1 sum (non-clustered key) OR all columns non-clustered keys (that uniquely identify a entry device).
qwertychouskie commented 6 years ago

Pertinent discussion on IRC:

@hiker Whow - do we really have between 10k and 50k android installs??? @Arthur_D Possibly @hiker I am impressed :) QwertyChouskie http://addons.supertuxkart.net/stats/ 187,865! @hiker Yes, but that's #files downloaded, not #installations We should have a look at our hardware stats :) @Arthur_D Issue is that they take too much resources on the server to generate @hiker Yes, I know :( That's why I wrote 'should' not 'look here' ;) Maybe we could add some less detailed statistic - just OS and numbers or so @Arthur_D From what leyyin told me the main issue is that there's no "diff" feature, so it has to compile everything every time @hiker You mean you can't just add more data in by just analysing new data? @Arthur_D And that eats up RAM like no tomorrow Yep, unless I misunderstood @hiker We should do something about this :P ← s8321414 has quit (Quit: Konversation terminated!) @Arthur_D Indeed Stragus What about compiling the data and updating the pages once per day or something? Then it doesn't matter if it takes a while @hiker That's what we are already doing QwertyChouskie Yeah, leyyin[m] you should do that ;P @hiker We can't run the daily cron process to update anymore @Arthur_D Well we stopped doing that since it crashes

  • Stragus scratches his head pondering how that could be slow
  • hiker is actually wondering the same :)

@Arthur_D Too high RAM usage QwertyChouskie How much RAM does the server have? @Arthur_D It's probably doing something incredibly inefficient 1GB if I remember correctly @hiker We got the whole stats package from 0ad (iirc) @Arthur_D Yep Stragus It really doesn't make much sense, you don't have billions of records @hiker Maybe we would need to rewrite it ... if we only had the time Stragus (and if you did, then you should stream from disk while compiling summaries, not caching the whole thing...) Oh well, it's probably not written in C or C++ so I couldn't really have a look Stragus It's very good to have an online reference of GL extension supports, for a lot of people, and the 0 A.D. database is very outdated (Where is the code of that thing?) @Arthur_D I think we have it in a GitHub repo for the add-ons site @hiker https://github.com/supertuxkart/stk-stats QwertyChouskie https://github.com/supertuxkart/stk-stats Stragus I'm sure a lot of folks in ##OpenGL would love an updated online database, especially if you have Android in there QwertyChouskie oops too late :) Stragus Eh, Python @hiker grin we might have android in there :P We just don't know Stragus So it's a SQL database, that could be read from a C program that outputs web pages ← Tobbi has quit (Quit: My MacBook has gone to sleep. ZZZzzz…) Stragus It's the SQL -> static_web_pages step that fails, correct?

  • Stragus is trying to find that in the source

@hiker I don't know any of the details, leyyon would know Kitoko i like how android made this sudden jump to #1 usage hehe → swift110 has joined QwertyChouskie Stragus https://github.com/supertuxkart/stk-stats/issues/7 may have some useful info Stragus Compiling the pages from scratch every time is no big deal, I think the Python is just broken → Auria has joined ⓘ ChanServ set mode +o Auria Stragus The code doesn't look catastrophically inefficient, except for buffering everything before writing, but I'm sure you don't have gigabytes of data... leper that might be wrong though :P at least last time someone asked we (0ad that is) had something about 60GB for that thing Stragus Oh :) Okay, so it needs to be accumulated/compiled while it's being read from the SQL database, not all buffered leper so that might very well be the issue (we also have someone interested in updating and improving our own copy of that, since the one maintaining that isn't active anymore, nor has been for quite a while) (and I guess some queries might be improved a lot by changing the DB schema, or just storing the submitted data somewhere, and extracting what is needed into some table) (which allows fast queries for a few things, and if something else is needed the raw data is still there, it might just take some time to parse it into something that can be queried) Stragus I can't write Python but that seems to be pretty simple, to stream the reports from SQL in the _save_devices() loop leper` likely, though I'd still suggest (at least that's what I did to that one mentioned above) to store the data in a different way, so quite a few queries could be done with simple sql queries (eg counting OSs) @hiker +1 :) Stragus The only way to reach gigabytes of data would be if identical reports aren't consolidated (merge an identical report to the existing one, increment some counter) leper last I checked the whole submitted data is some JSON blob that's stored in some field @hiker Stragus: at least for stk it should report one installation only once leper (with our version in some mysql instance that likes to corrupt the db, repairing which takes huge amounts of ram and ages)

qwertychouskie commented 5 years ago

https://github.com/supertuxkart/stk-stats/blob/master/userreport/maint.py#L109 is interesting here, looks like someone was trying to fix this already. Anyone know if it worked and if not, why not? @leyyin

vampy commented 5 years ago

To be honest I can't remember exactly, but afaik it was slowing down the script a lot.

qwertychouskie commented 5 years ago

Better slow than not working, right?

vampy commented 5 years ago

Slow as in will hog the server for like half a day slow, so it is technically not working. It needs more improvements as described in the issue above

qwertychouskie commented 5 years ago

Could you work on this soon then please? Or maybe @Alayan-stk-2 wants to learn python :P