osuripple / cheesegull

CheeseGull is an osu! mirror developed for Ripple.
MIT License
14 stars 8 forks source link

Better Caching System #15

Open thehowl opened 6 years ago

thehowl commented 6 years ago

We need to update our caching system. This is necessary due to the fact that osu! asked us to make less requests. We thus need to find a better way to cache beatmaps and serve them.

I'm mostly creating this issue to document what the plan is for the Ripple beatmap mirror and how we're going to solve the problem. Also, I want to clear up the process I have in my head, so that I can then proceed to write the code.

Problem

We need to limit requests to osu! as much as possible. At most 10000 requests per month (in a first phase, then we'd need to gradually scale it down to ~2000) for ranked beatmaps, and 600 requests per month for unranked beatmaps. This allows about 20 unranked beatmaps per day, so roughly one every 1hr12m. If possible, we'd also need to gradually scale down the unranked beatmap allowance, as that's very expensive for osu! to do.

Solution

Beatmaps are served on three levels:

CheeseGull will keep all of its discovery code, but will additionally remove stale beatmaps from its own cache and from Wasabi whenever they're discovered to be stale.

thehowl commented 6 years ago

Another approach: Decentralization

tl;dr: we can't because there's no verification mechanism. This is mostly just a braindump.


Another way to solve this would be decentralization. Either using a protocol of our own, or relying on IPFS, or Torrent, anything would do, but we could make a sort of "alliance" of people which rely on the mirror (mostly other Ripple-based osu! private servers) so that we all agree to take part in the decentralization effort by offering space or asking our users. This might seem a bit weird but it would solve quite a few issues:

The Verification Problem

There is a problem, though: that is, how can you know that a beatmap file is actually the same as the one that osu! gives? The answer right now is: you download the beatmap from the osu website and check the contents. There is no checksum or signature of the .osz file, so if you want to verify it, you'd run again in the original problem, which is lowering requests to osu!.

There is indeed a way to verify .osu files, because the osu! API provides a file_hash, which we can just use to verify them. That's one big part of the problem solved - except that a beatmapset is not just composed of .osu file, but must have a song and background and may have hitsounds, skin and video. How do we verify all of them?

As a note: we could ask on the osu-api issues to add the osz checksum, but as anyone who's been around in the osu! development community for long enough will know, asking something on osu-api is like shouting in a void.

So, we have no means of doing proper verification on the file contents

The reason we need this is that I wouldn't want Ripple to be the single point of contact with osu! - I'd like to have other servers (or even users!) be able to join in and add more beatmaps, either linking to the osu! API to have verification, or even better would be the beatmap file PGP-Signed (or with any other mechanism) by osu!, and even better signed with with the metadata of the beatmap as well.

We don't want to trust other servers when they say "Hey, look, this comes from osu!" without any verification. Sure, we could unzip everything and check the osu files and check all the beatmap information we can obtain via the osu! API, but there would be no mechanism to check the song file or any of the other files I listed previously.

Thus

OP is the best solution, at least until some verification is in place. sigh

MaxKruse commented 6 years ago

From a gameplay standpoint, it is not necessary to verify:

These can be changed by users however they desire without it affecting the submission process of scores they might get. There is nothing stopping people at this current moment from changing these files as it does not affect score submissions. It is actually happening quite a lot (some people delete the song files because they dislike them for whatever reason, change the BG of all maps to grey (rrtyui flashbacks) and swapping the custom hitsounds provided with the mapset for others).

Therefore, we don't need to verify these. It is completely fine to check for file_hash of each .osu file.

thehowl commented 6 years ago

Yes, I sort of considered that. But it is, however, important. We could even discard everything BUT we'd still need to check the music file. Mixing up music files is really not an option.

Besides, if all we could guarantee to be correct were beatmap files, then we could just serve data from osu.ppy.sh/osu/:id, with a random picture and a random mp3 file.

thehowl commented 6 years ago

For future reference and those reading along, I'll leave the counterargument I exposed talking with @ilyt on Discord:

there's quite a few issues with that, first of all peppy won't be happy with it because you'd be downloading unranked beatmaps too. he said that if I wanted to do that I'd have to pay $500 upfront for the costs that they have ilyt - Today at 21:32 ahh ok it'd only be occasionally downloading them (with the exception of the start) Howl - Today at 21:33 i'm not sure if downloading beatmaps straight after they are updated is ok for him (in the sense of whether they have them on amazon or on their download servers) but i think they probably don't go on the download servers straight after being updated, instead they are just placed on amazon until an user requests them ilyt - Today at 21:33 ahh Howl - Today at 21:33 which would probably mean hosting a mirror is an ongoing cost for them, which will probably become an ongoing cost for you i don't know which one of the two they do. i tried to ask, but I never got a response ilyt - Today at 21:34 hmm because thats really the only feesable way of doing it (in my head) is having them all stored on your own server and then deleting upon ranking Howl - Today at 21:36 the rest of your idea is basically what i explained on the first two posts, mostly the first though, mostly with more technical stuff on where to store the data so that we don't have to get an expensive server with ~4 TB of space (at least) If I could host at home or have the server which I can modify on my own, that probably wouldn't be an issue since I can just get the server and place a large disk in it, then there's no cost after that ilyt - Today at 21:36 hmm it really is only your first answer that would work Howl - Today at 21:38 but with hosting providers they generally end up increasing other specs apart from storage, which winds up being huge per-month costs when really it's just storage that you need