ppy / osu-web

the browser-facing portion of osu!
https://osu.ppy.sh
GNU Affero General Public License v3.0
980 stars 382 forks source link

Support easier full-stack deploys #5924

Open smoogipoo opened 4 years ago

smoogipoo commented 4 years ago

I sometimes need to run full reprocesses of all users to generate updated star rating / pp ratings based on new algorithms that the community comes up with.

Until now, I've maintained a fork of this repository which adds the necessary functionality to do this. The changes required can be seen here: https://github.com/ppy/osu-web/compare/master...smoogipoo:pp-tester

The most important requirements are:

A way to import SQL data dumps found on https://data.ppy.sh.

In dependencies/docker/start.sh, I've had to work around some shortcomings of this data.

Integration of the 4 core processes.

This is currently done through setting osu_counts.docker_db_step at each step of the way. All other processes (ppcalc/diffcalc/es-indexer) wait for this to reach different stages before continuing.

I'm open to other synchronisation solutions, such as potentially only running docker instances in order when the previous completes. In my investigations, docker-compose actually removed the ability for this in an update.

There should be automatic index creation after the data import finishes.

Adjust the default environment variables.

Have some way to allow non-logged-in access to beatmap listing search.

Other details

Some of the fat in my fork can be completely excluded, such as:

Tom94 commented 4 years ago

Out of curiosity: what's the use-case of computing pp for all modes at once? Their pp calculations are so far detached from each other that I can't imagine ever tweaking all of them at the same time.

Also, since the modes operate on different tables, I don't think there's much saved effort in computing the difficulties and indices for all modes at once as opposed to per-mode.

smoogipoo commented 4 years ago

Just to reduce the complexity of implementation. Right now I'm passing in two environment variables (MODE and MODE_LITERAL) to do exactly what you're saying.

If complexity is a non-issue then that's fine, I'm not against having env variables. Although for cases where data is being imported like this, it's generally the case that sr/pp calculations should take place.

Tom94 commented 4 years ago

Agreed... maybe then it would be a good idea to automate which modes are called by checking which types of data are imported? I.e. only run difficulty calculations for modes in which beatmap data is imported; only run pp calculations for modes in which score data is imported, etc.

smoogipoo commented 4 years ago

An isolated docker container to do absolutely everything here would be perfect. Something like:

export IMPORT_DATA=./my-import-data/
export THREADS=2
docker-compose up data-import