sauerbraten / p1xbraten

my patches on top of the official Cube 2: Sauerbraten client & server
15 stars 3 forks source link

player1 suicides when stopping while walking up stairs #26

Closed sauerbraten closed 3 years ago

sauerbraten commented 3 years ago

apparently, just like that. with no user input... (?)

reported by Shag as well as Eleisa.

sauerbraten commented 3 years ago

video of the bug provided by Eleisa: https://streamable.com/04vb7n a demo of it happening would be very helpful!

sauerbraten commented 3 years ago

p1xbraten_suicide_bug_xion.7.35.dmo.zip

sauerbraten commented 3 years ago

the demo shows a position packet with a Z position value > 0xFFFFFF (more precisely, a value in 0x[00-FF]000000) that is broadcasted by the server as three zero bytes, followed by a jumppad packet for the player (in this case, with ent ID 1, one of the jumppads on reissen).

this suggests a call chain along the lines of checkitems() -> trypickup() -> jumppadeffects(). however, I can't explain how the distance check in checkitems() passes, when the jumppad with ent ID 1 is at Z=~1290. (the affected player is at Z=~1424 in the demo, and not really close in terms of X and Y at all.) the suicide is then most likely triggered by the 0 Z value of the player position together with reissen's death floor.

sauerbraten commented 3 years ago

the last "known good" position packet of the bugging player is: 4 0 13 0 250 76 190 52 232 88 162 109 90 0 196 127

in this packet, the position's Z component is encoded as 232 88, which decoded comes out to 1422.5 (22760 on the wire, before downscaling). ((232 + 88<<8) / 16.0, 16.0 is the scaling factor for world locations.)

the same packet occurs 110 times unchanged, followed by: 4 0 9 4 250 76 190 52 0 0 0 162 109 90 0 196 127

in order for jumppadeffects() to be involved in this bug, the Z component has to be kinda close to the jumppad's when the distance is calculated inside checkitems(), and then be super high in jumppadeffects() where sendposition() is called. or the distance math suffers from an overflow of some sort.

also, notably, while jumppadeffects() does set physstate = PHYS_FALLING, it happens after the position was already sent, so this doesn't seem to be an explanation for the change in the third byte.

sauerbraten commented 3 years ago

another occurence: p1xbraten_suicide_bug_eleisa.9.20.zip

parsed with https://github.com/sauerbraten/dmolisher:

gamemillis, channel, data length, data (bytes in decimal)
46623, 0, 32, [4 2 5 0 90 33 194 32 129 36 157 102 90 0 146 126 4 0 4 0 238 42 16 17 225 37 86 105 90 0 152 127]
46623, 1,  5, [88 2 2 32 9]
46644, 0, 17, [4 2 1 4 90 33 194 32 0 0 0 157 102 90 0 146 126]
46644, 0,  3, [29 2 0]
46644, 0, 18, [4 2 1 12 90 33 194 32 0 0 0 157 102 90 144 1 32 253]
46644, 0,  4, [28 2 3 4]
46644, 0, 17, [4 2 1 8 127 56 0 29 255 35 153 101 90 144 1 32 253]
46644, 0,  3, [29 2 6]
46656, 0, 34, [4 2 1 24 127 56 0 29 37 36 153 101 90 38 1 32 253 2 4 0 4 0 238 42 16 17 225 37 86 105 90 0 152 127]
46661, 1,  5, [11 2 2 255 0]

same as in the first demo from reissen:

from then, it's different:

sauerbraten commented 3 years ago

the sequence in the second recorded case fits the theory that it's checkitems() going haywire, since the ents are all trypickup()ed in ascending order (2, 3, 6) and basically at the same time.

sauerbraten commented 3 years ago

Eleisa confirmed for me that it's reproducible (on macOS) by walking up stairs and stopping on them, so physstate == PHYS_STEP_UP is required and standing still while in that state seems to trigger the bug (~5 seconds after releasing all inputs).

sauerbraten commented 3 years ago

debug prints in checkitems()/trypickup() show that the entity's position's Y component as well as the feetposition's Z component get out of whack and are printed as "nan" by floatstr().

sauerbraten commented 3 years ago

the bug seems to only occur when building with clang and -O1/-O2/-O3. building the object files with gcc 11 and -O3, then linking in the final step using clang (for -f framework includes to work) produces a working binary.

sauerbraten commented 3 years ago

sometimes, instead of activating far-away ents, the hudgun disappears.

sauerbraten commented 3 years ago

building with -O3 and without -ffast-math also works fine. it seems somewhere the player's Z becomes NaN, and that trips up things like the distance function.

sauerbraten commented 3 years ago

the source of the bug seems to be that in this line https://github.com/sauerbraten/p1xbraten/blob/9c7ac12502d375d20e6e8d68f87a8f9ecb72808a/src/engine/physics.cpp#L1177 the expression dir.magnitude() sometimes, when dir is (0, -0, 0) returns -Infinity instead of the expected -0/+0. magnitude() is defined as https://github.com/sauerbraten/p1xbraten/blob/9c7ac12502d375d20e6e8d68f87a8f9ecb72808a/src/shared/geom.h#L131 and dir.squaredlen() still returns a correct 0.000000. something is weird with sqrtf() when clang uses -ffast-math and optimizations.

sauerbraten commented 3 years ago

the fix for now will be to build using gcc, which is also how the official binaries are built.