ppy / osu

rhythm is just a *click* away!
https://osu.ppy.sh
MIT License
15.07k stars 2.23k forks source link

`HitResults` are not calculated properly at the hit-window edges compared to osu!stable. #11311

Open abstrakt8 opened 3 years ago

abstrakt8 commented 3 years ago

Describe the bug:

For the OsuRuleset (did not research/test for other rulesets) I have found a bug as described in the title.

Example:

Given OD=10, which means the hit-window for Meh (50) is supposed to be +-59.5ms:

In Stable: +59ms => Ok, +60ms => Meh In Lazer: +59ms => Ok, +60ms => Ok which is incorrect.

This is demonstrated in one score of mine:

Side by side comparison: https://www.youtube.com/watch?v=IQGsfzOEyJY

  1. If you skip to the end you can already see that there is a big accuracy difference.
  2. At timestamp 1:57 in the last stream: There is supposed to be three consecutive 50's which is calculated correctly in Stable but not in Lazer.

I used gosumemory to calculate the hit offsets of this replay and at the suspicious note (=6th last note of the map) I hit the note with exactly +60ms offset. (https://gist.github.com/abstrakt-osu/46997fa9d203564ed1f1b1eee09a18ba#file-gosu-json-L894).

Screenshots or videos showing encountered issue:

Side by side comparison: https://www.youtube.com/watch?v=IQGsfzOEyJY

Beatmap: https://osu.ppy.sh/beatmapsets/1010865#osu/2117132 Score (+Replay): https://osu.ppy.sh/scores/osu/3321687146

Stable replay: https://youtu.be/FDhGuZh3P_U Lazer replay: https://youtu.be/Ositc3vu_bg

osu!lazer version: 2020.1225.0

Logs:

bdach commented 3 years ago

Given the following, I'm not surprised that lazer is giving a hit window of 60ms:

https://github.com/ppy/osu/blob/b448fd1b8b7441644bdeff72080e984dd838d33b/osu.Game.Rulesets.Osu/Scoring/OsuHitWindows.cs#L10-L16

https://github.com/ppy/osu/blob/b448fd1b8b7441644bdeff72080e984dd838d33b/osu.Game/Beatmaps/BeatmapDifficulty.cs#L45-L53

Not sure where the 59.5ms figure cited above is coming from.

abstrakt8 commented 3 years ago

Not sure where the 59.5ms figure cited above is coming from.

Took the value from https://osu.ppy.sh/wiki/en/Beatmap_Editor/Song_Setup#overall-difficulty . When hovering over the beatmap difficulty stats in stable you can also see the hit window for 100 being +-59.5ms. Of course it could be implemented differently internally.

peppy commented 3 years ago

this may be an intentional fix, for what it's worth. will need further discussion.

smoogipoo commented 2 years ago

The .5 figure is coming from the fact that osu!stable uses a strict < instead of lazer's <=. That is to say, in osu-stable terms, <60 is equivalent to saying <= 59.5 (or a timing window of 59.5ms) because the resolution is 1ms. Thus it's also song select's way of showing that this is the case as well.

Of note: In osu-stable, osu!mania uniformly uses <=, and I believe osu!taiko also uses <= for the normal hit (strongs are still <).

This could be fixed by subtracting 1 from these values, however I propose we keep them as-is and touch upon this in the future. I've labelled this issue as such.

bdach commented 2 years ago

The compatibility is one thing, but the replay playback errors are another. LegacyBeatmapEncoder applies a rounding operation to replay frames:

https://github.com/ppy/osu/blob/46be6bcadcfd8e977d5e87a00d7a29339f876fd4/osu.Game/Scoring/Legacy/LegacyScoreEncoder.cs#L126-L127

This means that, for instance, with a timing window of 60ms, if you hit a note at +59.6ms, it will be a 300 in the original play but then will change to +60ms in the replay and thus become a 100. In that case maybe frame times should be truncated/floored when being written out to replay?

smoogipoo commented 2 years ago

Probably not floored, because it works both ways - floor(-59.6) == -60. But truncation sounds reasonable.

smoogipoo commented 2 years ago

Probably not floored, because it works both ways - floor(-59.6) == -60. But truncation sounds reasonable.

Edit: I guess it doesn't matter since we're not dealing with relative time values... But that means I don't think just flooring is enough to avoid replay playback errors. You'd want to ceil early hits.

Walavouchey commented 1 year ago

relevant wiki page https://github.com/ppy/osu/wiki/Gameplay-differences-from-osu!stable#hit-window-edge-calculations-dont-match-stable (based on a quick glance of the discussion here, it might be missing some details for specific hit windows?)

Walavouchey commented 1 year ago

In the above wiki pr I've done some investigation about how stable does hit window comparisons. In summary:

Ruleset Comparison (stable) Comparison (lazer) Comparison (lazer replay)
osu! abs(round(hit error)) < floor(hit window) abs(hit error) <= hit window abs(round(hit error)) <= hit window
osu!taiko abs(round(hit error)) < floor(hit window), except for the miss window which uses <= abs(hit error) <= hit window abs(round(hit error)) <= hit window
osu!mania abs(round(hit error)) <= floor(hit window) abs(hit error) <= hit window abs(round(hit error)) <= hit window
A few examples | Case | Max error (formula) | Max error (lazer) | Error | Max error (lazer replay) | Error | Max error (stable) | Error | Delta (stable - lazer replay) | | :-- | :-- | :-- | :-- | :-- | :-- | :-- | :-- | :-- | | OK window osu!, OD 10 | 60 | <= 60.0 | 0 | < 60.5 | +0.5 | < 59.5 | -0.5 | -1 | | OK window osu!, OD 5.4 | 96.8 | <= 96.8 | 0 | < 96.5 | -0.3 | < 95.5 | -1.3 | -1 | | OK window osu!, OD 5.6 | 95.2 | <= 95.2 | 0 | < 95.5 | +0.3 | < 94.5 | -0.7 | -1 | | OK window osu!mania, OD 5.4 | 110.8 | <= 110.8 | 0 | < 110.5 | -0.3 | < 110.5 | -0.3 | 0 | | OK window osu!mania, OD 5.6 | 110.2 | <= 110.2 | 0 | < 110.5 | +0.3 | < 110.5 | +0.3 | 0 |
Some theoreticals | Case | Max error (formula) | Max error (lazer replay if ceiled&floored) | Max replay frame error | Max error (lazer replay if floored&ceiled) | Max replay frame error | Max replay frame error (stable) | :-- | :-- | :-- | :-- | :-- | :-- | :-- | | OK window osu!, OD 10 | 60 | < 61 | 60 | <= 60 | 60 | 59 | | OK window osu!, OD 5.4 | 96.8 | < 97 | 96 | <= 96 | 96 | 95 | | OK window osu!, OD 5.6 | 95.2 | < 96 | 95 | <= 95 | 95 | 94 | | OK window osu!mania, OD 5.4 | 110.8 | < 111 | 110 | <= 110 | 110 | 110 | | OK window osu!mania, OD 5.6 | 110.2 | < 111 | 110 | <= 110 | 110 | 110 | - "ceiled&floored" is smoogi's suggestion. It effectively means `floor(abs(hit error))`. - "floored&ceiled" is the reverse of smoogi's suggestion. It effectively means `ceil(abs(hit error))`. Neither of these two give correct effective hit windows (max error) for any case. The comparisons simply aren't the same. Of course, true hit timing aside, if we only compare replays between stable and lazer where we only see the ints, "ceiled&floored" would match stable in the case of osu!mania, but not for osu! (as in, you can be off by a whole millisecond in lazer but not in stable). Same for "floored&ceiled".

In short:

I haven't been able to verify this in-game though as I don't have the tooling or time to do so. Maybe there's a relevant test scene in lazer, or if not maybe one of those could be cooked up (comparing against stable rounding/truncation)?

LegacyBeatmapEncoder applies a rounding operation to replay frames:

Rounding the hit time for replays should be correct, but I'm not entirely sure what that implies. Are these rounded hit times used for replay playback of plays set on lazer/stable or only for .osr exports?

I believe osu!taiko also uses <= for the normal hit (strongs are still <).

I haven't been able to find anything of the sort in the stable reference code.

In stable, taiko misses are given here, the comparison for which (<=) can be seen here and here (variable name for the "miss window" is a bit misleading). Greats and Ok's are compared with <. There's no difference between regular and large taiko notes from what I can tell.

You'd want to ceil early hits.

Both stable and lazer use absolute hit error, so it's always comparing two positive values. There's no special case to consider here, especially since hit times are rounded in stable anyway.

https://github.com/ppy/osu/blob/d38316bf4f552629cf86dc5ccfe78c3f2fd95f27/osu.Game/Rulesets/Scoring/HitWindows.cs#L127-L138

Where timeOffset is the hit error (positive or negative, but made positive above). This is the same in stable (example).

smoogipoo commented 1 year ago

Rounding the hit time for replays should be correct, but I'm not entirely sure what that implies

Both stable and lazer use absolute hit error, so it's always comparing two positive values. There's no special case to consider here, especially since hit times are rounded in stable anyway.

There are two issues presented here: one is the hit error as you're playing the game, and the other is the timing error of replays. Legacy replays only store integer time deltas whereas lazer deals with floating-point time, so lazer rounds the time values to write into the replay.

Suppose you have a hitobject at 500ms. The user hits a circle at -59.6ms delta lazer-time (i.e. 440.4ms). Should this value be rounded up or down for storage in the replay. Conversely, suppose the user hits at +59.6ms, should this value be rounded up or down? My proposal is to ceil the former and floor the latter, that way even though there's still up to 1ms discrepancy it will always be the correct judgement.

There's no difference between regular and large taiko notes from what I can tell.

I believe your assessment here, and for the other hit results, is correct.

Walavouchey commented 1 year ago

Suppose you have a hitobject at 500ms. The user hits a circle at -59.6ms delta lazer-time (i.e. 440.4ms). Should this value be rounded up or down for storage in the replay. Conversely, suppose the user hits at +59.6ms, should this value be rounded up or down?

Rounding the value emulates the rounded-integer temporal granularity of stable. With that, the only difference for replays in lazer (with integer time values) is that they're still compared against doubles, non-floored hit windows (as well as the < vs <= thing). Using ceil and floor (instead of round) would only cause yet another difference between stable and lazer.

This affects the lazer part of the examples I presented I suppose. I've updated them with lazer replay timings and also made it all into a table.

even though there's still up to 1ms discrepancy it will always be the correct judgement.

Well I thought through the implications of this... (see updated examples section) but I can't convince myself that this would solve anything.

bdach commented 1 year ago

Reading through this again, I see two options:

This one is probably too technical for community discussion so we'll probably have to come to a conclusion internally. @ppy/team-client anyone wants to sound off on this, or is this going to be a matter of "try both and see which works best"?

Walavouchey commented 1 year ago

I haven't been able to verify this in-game though as I don't have the tooling or time to do so. Maybe there's a relevant test scene in lazer, or if not maybe one of those could be cooked up (comparing against stable rounding/truncation)?

is there a test scene for this? it would be invaluable for both outlined options

bdach commented 1 year ago

I'm not sure how such a test scene would look because it would require... including stable logic? Or I guess capturing some ground-truth stable replays and seeing how lazer handles them.

For now I mostly tagged to try and get some feelers out on the technical direction, to see if there are any preferences. Chances are that development on this task is gonna entail writing that test scene, then both variants, then choosing one.

peppy commented 1 year ago

I prefer making alterations to HitWindows if possible. Reasons:

The question I'd then have is: will this break backwards compatibility (playing lazer replays on stable) and is that something we even care about?


I'm also not against the other proposal if it turns out this is too difficulty to implement in a nice way.

bdach commented 1 year ago

will this break backwards compatibility (playing lazer replays on stable) and is that something we even care about?

I can't say if this specific proposed change will break anything, and that's probably to be tested at the implementation phase, but our backwards compatibility of replays is already pretty jank if I'm honest. Like, for one thing, we're still using the legacy format, but also allowing scores with lazer-only mods to be exported to that old format, so if a lazer mod is anything more then visual, then it can play in a completely broken manner on stable. It only sorta kinda works because we're appending some special lazer secret sauce that stable can't read.

As to whether the above should be fixed (not by porting mods over, obviously, more like something like stable reading the lazer secret sauce and loudly refusing to do anything with the replay), no idea, and out of scope here.

I think with the general feelers out of the way there's not much to discuss here further until someone actually takes this on, tries to implement either option, and sees what happens.

Wleter commented 1 year ago

Changing ResultFrom in HitWindows to virtual and overriding this method for the specific ruleset, according to the table sent by @Walavouchey, fixed the discrepancy in the replay.

In particular for OsuHitWindows the change was

public override HitResult ResultFor(double timeOffset)
{
    timeOffset = Math.Abs(Math.Round(timeOffset));

    for (var result = HitResult.Perfect; result >= HitResult.Miss; --result)
    {
        if (IsHitResultAllowed(result) && timeOffset < Math.Floor(WindowFor(result)))
            return result;
    }

    return HitResult.None;
}

Is this valid solution for this or am I missing something.

I also send the branch I'm working on, or should I make PR or draft PR. Obviously it needs more testing for osu and other rulesets.

bdach commented 1 year ago

@Wleter if you ask me then any attempt at this should likely include test cases in code and a comprehensive cross check against stable, which you currently don't have.

https://github.com/ppy/osu/pull/24636 also materially affects this (also touches mania hit windows). The two need to work together correctly.

We were also thinking that possibly this hit window interval clusivity thing should be a ctor argument instead (see relevant discord conversation).

I'd say that unless you're willing to put the effort into testing multiple approaches and testing the results as already discussed above it's probably best left to one of us given that we have access to stable source.

Wleter commented 1 year ago

@bdach sure I'll leave it to you since you have stable source.

peppy commented 11 months ago

Thinking about this from a practical perspective, I do wonder about floating point precision on time values. We are already capping the game (input) to 1000 hz, and I do not foresee that changing. So the real question is how much value there is in keeping double precision in the first place.

After rounding, the maximum error with int based timing is 0.5 ms, which is arguably well within expectations of precision (the tightest timing in the game is ±11.5 ms for an osu!mania MAX with HR applied, making the precision around 2% of this judgement).

I may want to rescind my opinion above and go with the simple approach here..

bdach commented 11 months ago

I may want to rescind my opinion above and go with the simple approach here..

I'm not sure I can judge which approach is the simple one without trying both if I'm honest with you.

peppy commented 11 months ago

Simple in terms of "matching stable means we don't have to track these discrepancies across multiple implementations". But I guess we are using double everywhere for time already so I can see what you mean.

smoogipoo commented 9 months ago

I don't think this is something that is a blocker for the path-to-ranked project since it mostly affects (very few) replays and is otherwise unnoticeable in gameplay. We can fix this at any point later on, with likewise minimal effect to gameplay.

Paturages commented 9 months ago

Adding another sample, just in case (same settings as https://github.com/ppy/osu/issues/25973)

Beatmap
Gameplay score
Replay score
Replay file

since it mostly affects (very few) replays

As high-level osu!mania gameplay is inherently reliant on tight timing windows, I think this is probably bound to happen more often than not for osu!mania replays. I don't think it's a blocker for ranked play though, as long as the gameplay score (not the replay one) makes sense.

peppy commented 9 months ago

This does affect both gameplay and replays FWIW

peppy commented 8 months ago

Bumping this to p0 until i or someone else investigates.

Detze commented 8 months ago

As noted, currently lazer hit windows are too lenient for standard and taiko, and too strict for mania, compared to stable hit windows. For OD 10 in standard, the OD hit window formula gives 20 ms for the 300 hit window. In reality, true hit error in stable, as measured by the game's clock, can be in the interval (-19.5 ms, 19.5 ms) to get a 300, as described by the stable comparison formula. In lazer, a new, more intuitive formula is used, which results in the interval of [-20 ms, 20 ms], a 0.5 ms difference, a non-trivial reduction in difficulty. The effect is even more pronounced for fractional ODs, as stable floors the hit window, while lazer doesn't, resulting in up to 1.3-1.5 ms of difference. The disparity disadvantages existing stable scores in standard and taiko, and advantages existing stable scores in mania. This means that if an obsoleting of stable scores isn't desired, then these stable's hit windows must be matched in lazer, and thus the intuitive formula sadly cannot be followed as it is.

This also means that 20 ms that is obtained from the OD hit window formula is no longer the "true" hit window - it's actually 19.5 ms. If the player asks for the actual hit window (eg. by hovering over the OD value in song select), it would therefore be best to return 19.5 ms, and not 20 ms. As a side note, it is the "true" hit window value that gets divided by speed rate, not the result of the OD hit window formula (except for mania, of course).

There is a disadvantage to rounding player inputs to integers - it introduces a small amount of additional variance not due to the player's input variance. This play, set on stable, has the UR of 28.84. Had it been played on lazer, it would be expected to have about 28.55 UR. This effect is less pronounced the higher the UR. Thus, it is preferable to use double for hit error.

This also means that downgrading double to int almost always changes the UR of the replay by a slight amount. This can be observed in Paturages's comment above. The UR value changing between gameplay result screen and replay result screen, while only changing slightly, could be confusing. The gameplay UR should be the one considered correct, not the replay one. That's an issue for another thread though.

An idea to resolve the issue would be to find a formula to use as the judgement formula which is equivalent to stable's, and works the same regardless of whether input is rounded (or transformed in another way, as we can modify LegacyScoreEncoder) or not, so that outputting legacy replays works. This is possible, and without rounding hiterror. The formula is abs(hit_error) < floor(hit_window) - 0.5 for standard and taiko (exclusive hit windows), and abs(hit_error) < floor(hit_window + 1) - 0.5 for mania (inclusive hit windows). They are equivalent to stable's for any arguments, except hit_error half-integer. This formula also has a nice property of just having the "true" hit window formula on the right hand side and just hit error on the left hand side.

Proof For x not a half-integer, and N integer, `abs(Math.Round(x)) < N` is equivalent to `abs(x) < N - 0.5`.
Proof round() matches Math.round() everywhere except possibly half-integers. thus here they're equivalent. useful Math.Round() properties for x not half-integer: Math.Round(x) < N is equivalent to x < N - 0.5 Math.Round(x) > N is equivalent to x > N + 0.5 case I: round(x) > 0 (this means x > 0.5, thus x = abs(x)) abs(round(x)) < N <=> round(x) < N <=> x < N - 0.5 <=> abs(x) < N - 0.5 qed case II: round(x) < 0 (this means x < -0.5, thus -x = abs(x)) abs(round(x)) < N <=> -round(x) < N <=> round(x) > -N <=> x > -N + 0.5 <=> -x < N - 0.5 <=> abs(x) < N - 0.5 qed case III - round(x) = 0 <=> x in (-0.5, 0.5), so abs(x) < 0.5 LHS: abs(round(x)) < N <=> 0 < N RHS: abs(x) + 0.5 < N thus abs(x) + 0.5 < 1, so this is true iff 1 <= N, which is equivalent to LHS qed
Hence, for x not half-integer, the stable judgement formula for standard and taiko is equivalent to `abs(hit_error) < floor(hit_window) - 0.5`. For mania and taiko miss hit windows, stable's formula is equivalent to `abs(round(hit_error)) < floor(hit window) + 1`, as both sides of the inequality are integers. This is equivalent to `abs(hit_error) < floor(hit window) + 0.5` and `abs(hit_error) < floor(hit_window + 1) - 0.5`.

What about the hit_error half-integer case? Stable uses Math.Round(), which uses a to-even tiebreaker method. Of course, the half-integers that aren't close to the "true" hit window edge work as expected for both stable's and the proposed formula. However, in stable, half-integers at the edge can be either judged as in or out, depending on the OD. The proposed formula simply considers them all out, regardless of OD. It's not impossible to support to-even half-integer rounding in the judgement check method, but it's pointlessly complex just to not make hitting accurately harder by less than a nanosecond, and the new formula is more consistent. I would say half-integer hit errors not resulting in the same hit result in gameplay as in stable is a non-issue. It's only meaningful in the context of legacy score encoding (and even then, it's a very rare issue).

Not sure if I'm understanding legacy score encoding correctly, but from looking at the linked code it seems to me that it's currently not working properly. For example, on osu!mania with OD 10, a hit that is 34.4 ms late would be a Good on live lazer, but rounded, it would be a hit error of 34 ms, which, when replayed, would be a Great. This would be fixed with the new proposed formula (by rounding, and with away-from-zero (in terms of hit error) tiebreaker to handle the half-integer case).

Proof New judgement formula (for standard and taiko): LHS: abs(hit error) < floor(hit window) - 0.5 Thesis: new lazer replay formula that gives the same results as the above formula is: RHS: abs(round(hit error)) < floor(hit window) - 0.5 with round() - tiebreaker whatever we want RHS can drop the -0.5 because it's two integers (for integers, N < M iff N < M - 0.5) abs(round(hit error)) < floor(hit window) for not half-integer hit error, this is equivalent to abs(hit error) < floor(hit window) - 0.5 qed for half-integer hit error: only half-integer exactly at the "true" hit window edge need careful inspection - if the half-integer is (in absolute value) smaller/larger than the exact edge, then it is in/out by a comfortable margin and thus the result doesn't change between the two formulas for the new judgement formula, half integers are considered out. thus, in round(), we want to: for hit error > 0: ceil the hit error for hit error < 0: floor the hit error in total, we want round() to use away-from-zero tiebreaking The mania and taiko miss hit window proof is the same, with an extra +1 on the RHS.

Unfortunately, from what I can see, LegacyScoreEncoder operates on replay frame time values, not hit errors. We would need to know the time of the object being hit (then, with the new proposed formula, we could find the correct integer to downgrade to). As it is, a 34.4 ms early hit would be x.6 in beatmap time, and thus change the rounding direction compared to a 34.4 ms late hit. Because of this, it doesn't seem to be possible to round in the proper direction with just the replay frame time value. Because of this, we would need to know the time of the object being hit to correctly round half-integer hit errors (again, it's a very rare case, but once in a blue moon this might create a mismatch between gameplay and legacy replay).

After this issue is resolved, previously set lazer scores might have higher accuracy than they should, if there were hits at the edge (for example, hits in the [19.5 ms, 20 ms] range on OD 10 in standard). Unfortunately, it's either them or all the historical scores from stable that will be inaccurate, so the choice seems clear. This is why I would say this is actually an important issue to get right. Correcting ruleset rules after scores had been set on the leaderboards and pp awarded could be difficult.

I've submitted a PR that aims to resolve the issue using this approach here.

peppy commented 8 months ago

Stable uses Math.Round()

Out of curiosity and concern, how are you checking against stable when the code is not published? Usually we'd have people ask for the relevant extract of code before making statements like this.

Detze commented 8 months ago

I claimed that based on Walavouchey's investigation with stable reference code, the fact that Math.Round() is the method one would normally use to round in C#, and that indeed that's what was used in LegacyScoreEncoder.

Detze commented 8 months ago

Your comment made me think more about the rounding, and its relation to legacy score encoding. The part of the formula in Walavouchey's table responsible for the hit error reads simply: round(hit_error). But this, in fact, hides one more operation, since hit error = hit time - object time. Therefore, in stable, rounding could actually be applied in either of two places (or even both, but for stable, that would be equivalent to (II)):

Reading Walavouchey's table, I assumed (I) was the used formula, and my remarks about legacy score encoding in the previous comment were based on that assumption. But in fact, the used formula could also be (II), and if that's the case, then the legacy score encoding can be rounded in the correct direction with just the replay frame value, as from that formula it can be seen that rounding the time value is correct to downgrade to int in legacy score encoding, as it basically repeats the same operation.

If object_time is an integer, and osu!stable hit object times fortunately are integers, these are equivalent (proof omitted). I was wrong in the "Unfortunately, from what I can see {...}" paragraph. The rounding direction does change, but that's the correct behavior to downgrade a fractional value to a symmetrical interval of integers.

However, this is the correct way to downgrade only if the judgement formula is equivalent to stable's, and my '34.4 ms late hit' point still stands. For example, in mania, a 16.4 ms off (either way) hit would be Perfect in stable, and in lazer with my PR, but would be a Great in live lazer (if Perfect hit window was 16 ms), while being rounded to 16 ms during legacy score encoding, which would be a Perfect.

Moreover, in lazer, the object time is a double. The two formulas are not equivalent for fractional object times. For double object times, the legacy score encoding will work correctly only if stable followed the (II) formula. Although I would not expect stable to be able to replay lazer's double object time maps anyway.

Detze commented 8 months ago

I'm a bit confused how it's possible that a replayed osu!mania lazer play doesn't have the exact same result counts? https://github.com/ppy/osu/issues/25973 was closed with a comment pointing to LegacyScoreEncoder code, and I don't quite see how that's relevant. Does immediately replaying an osu!mania lazer play somehow use legacy encoded replays?

Detze commented 8 months ago

I'm a bit confused how it's possible that a replayed osu!mania lazer play doesn't have the exact same result counts? #25973 was closed with a comment pointing to LegacyScoreEncoder code, and I don't quite see how that's relevant. Does immediately replaying an osu!mania lazer play somehow use legacy encoded replays?

Reading through Player.cs it seems that legacy rulesets indeed use legacy replays in lazer. That's surprising to me, I thought lazer would have its own new, more accurate replay format.

If anyone was wondering, here's how the results are counted differently in Paturages's mania replays:

Maboroshi: the map is OD 8.3. live lazer's Perfect hit window is 15.77 ms, and Great hit window is 39.1 ms. In stable, they're 16.5 ms and 39.5 ms. In gameplay, hits that are off by x in [15.5, 15.77] ms are Perfect, but in the replay they get rounded to 16 ms and are thus Great. This is why the replay's Perfect count is lower. On stable, the Perfect count would be even higher than in lazer gameplay, and so would the sum of Perfect and Great counts. Neither lazer gameplay's nor replay's result counts are correct from stable's point of view. Counts in lazer with my PR are 2117/418/23/1/0/2, which looks legit.

The Weekend: the map is OD 8; live lazer's Perfect hit window is 16.1 ms, stable's is 16.5 ms. The rest of the hit windows are all 0.5 ms wider in stable because of rounding. In gameplay, hits that are off by x in (16.1, 16.5] are Great, but in the replay they get rounded to 16 ms and are thus Perfect. Similarly, all the other results have 0.5 ms more lenient hit windows in replay and stable, and this is why the replay has higher accuracy than the gameplay across the board. The hit result counts in the replay are actually the correct result counts from stable's point of view, as the judgement formulas are equivalent, and legacy score encoding rounds the same way as stable. I get the same hit results counts in lazer with my PR.

bdach commented 4 months ago

I'm deprioritising this from p0. https://github.com/ppy/osu/pull/26452 exists but it hasn't been reviewed by anyone but me despite being open for months and judging from references to it from elsewhere we might not be interested in doing anything about it at all. Unless I'm wrong @ppy/team-client?