Addition of replay parsing functionality

chudooder commented 8 years ago

This is more of a checklist for me as I move forward with the replay parsing integration, but there are few things I want to start discussion on since access to the official data sources could change things about the flow of the user experience.

Functionality spec

See here for my website: http://osureplay.com/ And here for the parsing code: https://gist.github.com/chudooder/ac6b2d55fe0905b096eb2100cc5437cf

Uploading replays

User goes to upload page, chooses a replay from their file system, then submits.
System passes the replay to the backend parser. This step checks the replay file for the beatmap ID, then makes an API call to obtain the information necessary to properly emulate the game. It also downloads the raw beatmap file (.osu) if it is missing from the file system.
The parser simulates the game running for each input in the replay file, gathering data on hits, misses, timing information, and other things.
The parser outputs this information in JSON format, which is saved to the database. This data is then sent back to the client and the information is displayed.

Searching replays

User can search for replays based on the player name, beatmap name, date, difficulty, etc.
Upon receiving a list of search results, the user can select one, bringing them to the replay summary which displays the information that was calculated when the replay was uploaded.
Stuff to do now

The parser is not perfect; there are numerous reported cases where hits will register as misses and vice versa. Someone with more intimate knowledge of the game mechanics would be able to help me with this. One major culprit is the note stacking system, which the parser does not currently account for. The goal is to achieve 1:1 emulation of the game engine so that the parsed data is as accurate as possible.

It would obviously be very expensive to perform parsing on every single play. We have a few options on how we choose which replays to parse:

Top scoring replays (the ones marked with stars on the user's dashboard) are permanently stored. It would make sense to parse and store replay summaries for these.
Automatically parse plays that break the user's previous record.
Provide an avenue for users to manually submit replays for parsing, like the current system on osu!replay.

Let's start a discussion on which endpoints we'd like to open up.

I'm no designer. The osu!replay site is functional, but it doesn't look anything like our current theme for the new website. If someone could whip up concepts for how to display the different types of charts found on the summary pages, I'd be happy to implement them.

Future possible cool stuff

Being able to upload, search, and view summaries for replays all from the game client would be amazing, though it may qualify as a supporter-only feature at that point.
Track the user's progress on a variety of fields over time, such as accuracy (score), precision (where they click on the circle), unstable rate, star difficulty, etc.
Show recent parsed plays on the main page, or on a "Plays" page that displays links to recent replays by top players or high scoring replays in general.

peppy commented 8 years ago

So just a bit of feedback:

I wouldn't allow uploading of replays at all. They would be sourced from the replays stored on the servers instead. If people want to upload their replays, they should continue using osureplay.com
Parsing can be done on page load (and cached to redis) for a period of time. There's no need to do parsing ahead-of-time unless it is to be used for other purposes (automated analysis for anti-cheat or other stats, for example).

I've let @arflyte know about this system and he is working on a design to fit the new site.

chudooder commented 8 years ago

I think the most straightforward solution is to add a route like /r/, then link to those pages from other places like the beatmap page's top scores and user profile top scores. I'll add a link to my current parsing code to the OP in case anyone wants to take a look; It's written in python and is a bit messy (and buggy).

peppy commented 8 years ago

For what it's worth, i believe we're doing away with the short links (/b/ /s/ etc.) in favour of properly structured ones. But that is an easy change to make later on too.

Do you think the analysis code would be easy to port to php? If not, we can probably just use the python version for now, running a processing queue.

chudooder commented 8 years ago

The replay file needs to be handled like the other .db files (byte by byte parsing) and contains an LZMA string which needs to be decompressed. If libraries for those exist then it's definitely possible to port the code, but it'll take some time since I'm not used to working in php. And I'd like to fix those remaining phantom miss bugs with a language I'm more familiar with before porting anyway.

peppy commented 8 years ago

As long as you are happy with open-sourcing the py version under a license we can use then that should be fine for now :).

ppy / osu-web

Addition of replay parsing functionality #675

Functionality spec

Stuff to do now

Future possible cool stuff