twohoursonelife / OneLife

Two Hours One Life, building upon One Hour One Life. Join us on Discord to play.
https://twohoursonelife.com
Other
46 stars 39 forks source link

Repeated Linux crash on v20313 #237

Closed selb closed 2 months ago

selb commented 2 months ago

Sequence of events leading to the repeated crash:

gdb backtrace from one reconnect crash, and valgrind output from another reconnect:

https://gist.github.com/selb/50fb940347e9b96c465bab46d3cf8d7f

EDIT: vanilla, not hetuw :)

connorhsm commented 2 months ago

Thank you, we're aware and looking into this already, appreciate the detailed logging.

Initial investigation, I've set up a fresh client on my Kubuntu machine and have not been able to produce the issue while playing in a town with ~3 others or /die'ing between multiple families.

selb commented 2 months ago

stripFertilitySuffix() looks most suspicious, since a notable difference from your testing is that this was a private spawn code where we were doing NO BB. This code in particular is invalid as foundSuffix points within name's allocation:

            delete [] name;
            foundSuffix[0] = '\0';

And I can confirm that the other player was as yet unnamed at the time, so this branch would have been taken.

connorhsm commented 2 months ago

@zabala6 had reported this initially following release. Since, they and @TanyaPegasus have reported the issue repeating with reproduction steps on Discord. With this, I've been able to experiene the crash myself.

My repro:

  1. Spawn using spawn code
  2. Immediatly use /die
  3. Select "OK" and then "Get reborn"
  4. Immediatly say "no bb"
  5. Immediatly then use /die
  6. Client crashes without showing the death screen

I was never able to trigger the crash before /die'ing once.

As mentioned by others, I found that if I waited some amount of time by walking around or naming myself, the crash would not occur.

I believe this aligns with what you're suggesting @selb.

selb commented 2 months ago

Yeah, when doing that sequence, the only invalid write valgrind sees is in stripFertilitySuffix() .

And it makes sense that not naming yourself (or observing an unnamed infertile player) would increase the likelihood of the crash: this makes the zeroing occur at the beginning of the allocation, which is way more likely to be have been reused for allocator metadata than a byte that is 5+ bytes in after a typical name.

risvh commented 2 months ago

Way to reproduce the crash: Hover over another player who is unnamed and infertile. This way works too because both the death message and the name displayed below the character call the same function.

Thanks Selb for pinpointing the cause!