gschup / ggrs

GGRS is a reimagination of GGPO, enabling P2P rollback networking in Rust. Rollback to the future!
Other
501 stars 25 forks source link

local_frame_advantage bigger than i8::MAX #35

Open cscorley opened 2 years ago

cscorley commented 2 years ago

Describe the bug Calculations for quality reports use i8 and can be too small during stress testing.

To Reproduce Steps to reproduce the behavior:

Expected behavior Non-panic, just keep skipping frames.

Screenshots I was testing a game with clumsy on Windows, using this configuration. image

Desktop (please complete the following information):

Additional context

gschup commented 2 years ago

Thanks for your report!

While this would be easy to fix by going from i8 to i32 or comparable, I am very curious how this issue can arise. Usually, when adding advancing the gamestate without remote input, PredictionThreshold occurs to halt the user from advancing too much (error is thrown here).

This should mean that local_frame_advantage should never be higher than the max prediction threshold, which by default is 8. In your example, you left that threshold at 8.

At this point, I can only hypothesize, but maybe I somewhere did a mistake when casting the numerical types from unsigned to signed and there is a silent overflow due to going negative? I will check it out later this week, probably on or shortly before the weekend.

PS: Sounds great! 😄

gschup commented 2 years ago

Another thought: Since the clumsy settings aren't too extreme (I tried worse both with real connections and in clumsy), could it maybe be the case that one of the clients cannot keep up with the framerate? This could especially be the case with rapier and big physics things...

cscorley commented 2 years ago

You may be right about it being related to actual framerate and not network. I also have tested this by just laying down a sleep 3s (e.g. dumb cap to approximate 180 frames) in one of the client systems and it seems to just "break through" the PredictionThreshold (after 111 of them) and allow one client to continue rendering, all before the other even completes it's first frame. (I do not have any framerate limiting in bevy itself, just the FPS limit set by bevy_ggrs)

In one test, I was able to get it to produce some 'fun errors' on the clients (but I don't know which was actually sleeping, forgot to log that info)

client 1

thread 'main' panicked at 'view entity should exist: QueryDoesNotMatch(3v0)', C:\Users\cscorley\scoop\persist\rustup\.cargo\registry\src\github.com-1ecc6299db9ec823\bevy_core_pipeline-0.7.0\src\main_pass_2d.rs:45:14

client 2

thread 'main' panicked at 'assertion failed: frame_to_load != NULL_FRAME && frame_to_load < self.current_frame &&\n    frame_to_load >= self.current_frame - self.max_prediction as i32', C:\Users\cscorley\scoop\persist\rustup\.cargo\git\checkouts\ggrs-151f8bd9eb75dc5d\9e4a20a\src\sync_layer.rs:138:9
MaxCWhitehead commented 3 months ago

I have reproduced a couple of these while stepping through a ggrs test in debugger (so there are larger delays in wall clock time). This does not play nice with the code using wall clock time in ggrs, and can cause issues.

I most commonly hit thread 'main' panicked at 'assertion failed: frame_to_load != NULL_FRAME && frame_to_load < self.current_frame &&\n frame_to_load >= self.current_frame - self.max_prediction as i32', I believe this is due to a rollback being triggered for disconnect_frame 0 (disconnect frame chosen as first_incorrect), when current_frame is also 0.

When debugging I commented out the time check that triggers network interrupt events, and then after that I reproduced the assert in original post. Remote frame in the local frame advantage computation is based on an estimation with ping + framerate. It does not actually use the true remote frame, so when debugging or in very poor net conditions, it is possible for this to get very large and assert.

(Not sure if it's sound for this failure to occur considering I commented out disconnect code - but just sharing notes on that.)

MaxCWhitehead commented 3 months ago

You may be right about it being related to actual framerate and not network. I also have tested this by just laying down a sleep 3s (e.g. dumb cap to approximate 180 frames) in one of the client systems and it seems to just "break through" the PredictionThreshold (after 111 of them) and allow one client to continue rendering, all before the other even completes it's first frame. (I do not have any framerate limiting in bevy itself, just the FPS limit set by bevy_ggrs)

I would recommend confirming disconnect events are handled, I have a suspicion that may be the cause. Once client is disconnected GGRS will report this event, and continue to advance.