Support easier full-stack deploys

smoogipoo commented 4 years ago

I sometimes need to run full reprocesses of all users to generate updated star rating / pp ratings based on new algorithms that the community comes up with.

Until now, I've maintained a fork of this repository which adds the necessary functionality to do this. The changes required can be seen here: https://github.com/ppy/osu-web/compare/master...smoogipoo:pp-tester

The most important requirements are:

A way to import SQL data dumps found on https://data.ppy.sh.

In dependencies/docker/start.sh, I've had to work around some shortcomings of this data.

[ ] Users are imported from sample_users into phpbb_users.
[ ] sample_users does not contain all the data of phpbb_users. Of particular importance is user_lastvisit and osu_playmode, which I've had to manually update the data of post-import. This could be fixed by including these columns into the data dumps (or including a cut-down phpbb_users instead of sample_users).
[ ] The data dumps don't contain dumps of osu_countries, osu_genres, and osu_languages. The default deploy also doesn't contain default values for these tables which causes (/used to cause?) random JS errors on profile + beatmapset pages.
[ ] The data dumps don't contain dumps of osu_user_performance_rank, which causes (/used to cause?) JS errors when displaying the graph in the profile, and would cause the profile page to crash.
[x] The migrations on phpbb_users list user_id as a mediumint, but should be updated to int to sufficiently support the sample data. (https://github.com/ppy/osu-web/issues/6001)

Integration of the 4 core processes.

[ ] There are 4 significant processes that must run in a set order:
1. Data import (see above).
2. Difficulty calculation (see diffcalc container).
3. PP calculation (see ppcalc container).
4. ES indexing (see esindexer container).

This is currently done through setting osu_counts.docker_db_step at each step of the way. All other processes (ppcalc/diffcalc/es-indexer) wait for this to reach different stages before continuing.

I'm open to other synchronisation solutions, such as potentially only running docker instances in order when the previous completes. In my investigations, docker-compose actually removed the ability for this in an update.

There should be automatic index creation after the data import finishes.

[ ] There should be no need to run es:index-documents and es:create-search-blacklist to do this.

Adjust the default environment variables.

[ ] .env.example has the default ES configuration commented out. I'm appending the ES config.
[ ] Similarly, I've had to set QUERY_DETECTOR_ENABLED=0 to remove errors that would pop up on beatmap set/profile pages. It would be helpful if there was a production environment read to use that didn't require manual adjustments.

Have some way to allow non-logged-in access to beatmap listing search.

[ ] I used to be able to disable the check in app/Libraries/Search/BeatmapsetSearchRequestParams.php, but this doesn't work anymore. (https://github.com/ppy/osu-web/issues/6121)

Other details

Submodule ppy/osu-difficulty-calculator, ppy/osu-performance, and ppy/osu-elastic-indexer and allow sr/pp calculations to take place.
- DON'T submodule osu-server as I'm doing (use osu-difficulty-calculator instead).
- Order of processes: osu-difficulty-calculator -> osu-performance -> osu-elastic-indexer.
- In my fork, osu-difficulty-calculator references a local beatmap directory. It could reference a URL instead via \"download_path\": \"https://localhost/osu/{0}\", but it would be nice if there was support for that at an nginx/laravel level (perhaps linking to some storage location in the .env or something).
- In the end, osu-difficulty-calculator and osu-performance should not require the mode to be passed in (https://github.com/ppy/osu-performance/issues/113 for osu-performance, and you can omit the option completely for osu-difficulty-calculator).
- Would be nice, but not required, to have these systems support a number of threads as they do in my fork, perhaps via env-variables or .env.

Some of the fat in my fork can be completely excluded, such as:

mysqld config.
adjustments to .env.example

Tom94 commented 4 years ago

Out of curiosity: what's the use-case of computing pp for all modes at once? Their pp calculations are so far detached from each other that I can't imagine ever tweaking all of them at the same time.

Also, since the modes operate on different tables, I don't think there's much saved effort in computing the difficulties and indices for all modes at once as opposed to per-mode.

smoogipoo commented 4 years ago

Just to reduce the complexity of implementation. Right now I'm passing in two environment variables (MODE and MODE_LITERAL) to do exactly what you're saying.

If complexity is a non-issue then that's fine, I'm not against having env variables. Although for cases where data is being imported like this, it's generally the case that sr/pp calculations should take place.

Tom94 commented 4 years ago

Agreed... maybe then it would be a good idea to automate which modes are called by checking which types of data are imported? I.e. only run difficulty calculations for modes in which beatmap data is imported; only run pp calculations for modes in which score data is imported, etc.

smoogipoo commented 4 years ago

An isolated docker container to do absolutely everything here would be perfect. Something like:

export IMPORT_DATA=./my-import-data/
export THREADS=2
docker-compose up data-import

ppy / osu-web