ppy / osu-stable-issues

Report critical osu-stable issues here
61 stars 11 forks source link

Score submission congestion prevents chat connection #818

Open aticie opened 3 years ago

aticie commented 3 years ago

When in a multiplayer setting, if your score can't get submitted, it doesn't timeout for a very long time (waited for 2 minutes). image

And this prevents your chat messages from being sent to IRC chat as well. If the profile banner shows "Submitting score...", you can receive messages from chat, but can't send any.

It doesn't make sense since you can receive messages, it means your connection is ok, you should be able to send messages as well. It also blocks the cho messages such as readying up, picking up mods, leaving the multiplayer lobby.

Edit: This is detrimental in tournaments because sometimes you have to wait for a very long time for the client to send your score. Or you have to restart the game in order to fix it after every match.

peppy commented 3 years ago

Score submission and chat are two completely separate systems. This sounds like a network issue at your end affecting all network operations.

peppy commented 3 years ago

If you would like further investigation to happen here, at bare minimum we would need network.log after reproducing the issue on cutting edge.

You should also keep a ping session open to a known third party to monitor your internet for issues.

aticie commented 3 years ago

If you would like further investigation to happen here, at bare minimum we would need network.log after reproducing the issue on cutting edge.

You should also keep a ping session open to a known third party to monitor your internet for issues.

network.log

Here's the network.log

I noticed that this issue happens when someone (tournament streamers) spectates you and if the client can't send frames in time, at the end we will have the "Submitting score..." bug.

The issue probably stems from the congestion of sending replay frames. But as you can see from the video below, my network connection is not interrupted as I had a ping session open to google.com. (See 3:41)

Here's a video demonstrating this:

https://streamable.com/ltaa8p

Highlights:

00:00: Multiplayer starts 01:01: First connection issue happens. This lasts for almost a minute, ends at 1:59. 02:08: Second connection issue. Although the map is finished by now, the spectator states that I'm still at the middle of the map. 02:40: Demonstration of chat not working when waiting for "Submitting score....". BanchoBot doesn't answer to !roll. I can't send messages, but I can receive others' messages. 03:02: Shows cho protocol not working, can't ready up. 03:41: Uninterrupted ping session to google.com

Please open this issue back as it is not on my end. I had an uninterrupted ping session + an uninterrupted discord call. My network connection is fine.

Edit: For network logs,

2021-08-14T20:40:53: Match starts 2021-08-14T20:41:53: 1 minute into map, first connection issue. RequestTimeout lingers until 20:42:50 2021-08-14T20:42:59: Second minor connection issue starts here.

peppy commented 3 years ago

Thanks for the log!

A few things:

So some more questions to try and determine what/where this issues lies:

aticie commented 3 years ago
  • Multiple times in the past, we've seen users' routers (or software like firewall / antivirus) limiting requests to a server for hardware or throughput reasons. There is a chance your router may be throttling requests. If you can reproduce this often, are you able to test on a 4G/5G connection, or on a different network?
  • Do you use wifi?
  • Are you able to send your IP address to me (pe@ppy.sh) next time this happens? I can check our web logs and also cloudflare's blacklist to make sure nothing dodgy is happening there.
  • Can you share your runtime.log from the same session? Also what windows version you are on.
peppy commented 3 years ago

@aticie did you have a change to check on a different connection yet?

aticie commented 3 years ago

Yes, I tried from a different computer on an LTE network and the same thing happened.

If someone watches you and buffers, this submitting score issue happens. I don't have video footage, but I took some screenshots that I can show.

Here are the logs from the session with 4G (LTE) connection. Logs.zip

I've had ping session to google.com again, there were some latency fluctuations but that's probably due to nature of mobile data. An excerpt from the session:

Reply from 172.217.169.174: bytes=32 time=60ms TTL=110
Reply from 172.217.169.174: bytes=32 time=71ms TTL=110
Reply from 172.217.169.174: bytes=32 time=69ms TTL=110
Reply from 172.217.169.174: bytes=32 time=49ms TTL=110
Reply from 172.217.169.174: bytes=32 time=52ms TTL=110
Reply from 172.217.169.174: bytes=32 time=40ms TTL=110
Reply from 172.217.169.174: bytes=32 time=51ms TTL=110
Reply from 172.217.169.174: bytes=32 time=50ms TTL=110
Reply from 172.217.169.174: bytes=32 time=49ms TTL=110
Reply from 172.217.169.174: bytes=32 time=61ms TTL=110
Reply from 172.217.169.174: bytes=32 time=64ms TTL=110
Reply from 172.217.169.174: bytes=32 time=60ms TTL=110
Reply from 172.217.169.174: bytes=32 time=55ms TTL=110
Reply from 172.217.169.174: bytes=32 time=46ms TTL=110
Reply from 172.217.169.174: bytes=32 time=49ms TTL=110
Reply from 172.217.169.174: bytes=32 time=56ms TTL=110
Reply from 172.217.169.174: bytes=32 time=68ms TTL=110
Reply from 172.217.169.174: bytes=32 time=49ms TTL=110
Reply from 172.217.169.174: bytes=32 time=46ms TTL=110
Reply from 172.217.169.174: bytes=32 time=55ms TTL=110
Reply from 172.217.169.174: bytes=32 time=61ms TTL=110
Reply from 172.217.169.174: bytes=32 time=59ms TTL=110
Reply from 172.217.169.174: bytes=32 time=130ms TTL=110
Reply from 172.217.169.174: bytes=32 time=62ms TTL=110
Reply from 172.217.169.174: bytes=32 time=48ms TTL=110
Reply from 172.217.169.174: bytes=32 time=48ms TTL=110
Reply from 172.217.169.174: bytes=32 time=48ms TTL=110
Reply from 172.217.169.174: bytes=32 time=64ms TTL=110
Reply from 172.217.169.174: bytes=32 time=45ms TTL=110
Reply from 172.217.169.174: bytes=32 time=59ms TTL=110
Reply from 172.217.169.174: bytes=32 time=59ms TTL=110
Reply from 172.217.169.174: bytes=32 time=58ms TTL=110
Reply from 172.217.169.174: bytes=32 time=54ms TTL=110
Reply from 172.217.169.174: bytes=32 time=58ms TTL=110
Reply from 172.217.169.174: bytes=32 time=53ms TTL=110
Reply from 172.217.169.174: bytes=32 time=93ms TTL=110
Reply from 172.217.169.174: bytes=32 time=54ms TTL=110
Reply from 172.217.169.174: bytes=32 time=67ms TTL=110
Reply from 172.217.169.174: bytes=32 time=53ms TTL=110
Reply from 172.217.169.174: bytes=32 time=48ms TTL=110
Reply from 172.217.169.174: bytes=32 time=60ms TTL=110
Reply from 172.217.169.174: bytes=32 time=66ms TTL=110
Reply from 172.217.169.174: bytes=32 time=51ms TTL=110
Reply from 172.217.169.174: bytes=32 time=77ms TTL=110
Reply from 172.217.169.174: bytes=32 time=66ms TTL=110
Reply from 172.217.169.174: bytes=32 time=73ms TTL=110
Reply from 172.217.169.174: bytes=32 time=48ms TTL=110
Reply from 172.217.169.174: bytes=32 time=53ms TTL=110
Reply from 172.217.169.174: bytes=32 time=45ms TTL=110
Reply from 172.217.169.174: bytes=32 time=51ms TTL=110

There were no dropped packets.

As for the reproduction of the bug, I follow these steps:

This time, it said 'Submission Complete' but it was still stuck. I couldn't ready up nor select mods. image image

Also to add to this, today I had a tournament match and my opponent had the same thing happening:

https://www.twitch.tv/videos/1128198918?t=0h15m24s

Although can't be seen here, we were both stuck with the 'Submitting score...' bug. We both had to restart the game in order to fix. You can also see the referee kicking both of us because we still linger in the lobby even though we closed the game.

aticie commented 3 years ago

I really don't care about sharing my IP as it changes like daily, 176.88.136.24 was my IP when I was playing the tournament.

Also, this is like a common occurrence in tournaments but I don't see enough people reporting it. I see that most people brush it off by saying 'Bancho moment'.

peppy commented 3 years ago

Select a long map and hope that they buffer

From our end, this makes no sense. Another player/spectator buffering cannot affect your network connection. Probably best to avoid inferring causality when reporting these issues.

peppy commented 3 years ago

It looks like in one of your videos you had a network monitor / dump application running. Any chance of being able to provide a full network dump? There's definitely something weird going on at the network level and since I can't reproduce this, that may be the only way to determine what is happening.

aticie commented 3 years ago

Select a long map and hope that they buffer

From our end, this makes no sense. Another player/spectator buffering cannot affect your network connection. Probably best to avoid inferring causality when reporting these issues.

It is not causality, that's just how I reproduce the bug. If I see the network icon on the bottom right when someone is watching me, I ask them to confirm if they are buffering. That means that I most likely reproduced the bug.

The issue is not network-related but the behavior of the game.

If I can receive messages, usually I should be able to send messages as well. To add to this, it happens when I'm streaming too. My stream either suffers minor packet loss or not, I can't confirm that, but it doesn't completely stop for a full 2-3 minutes. Even if it's a network issue, it doesn't justify not being able to send anything for a whole 2 minutes issue.

aticie commented 3 years ago

It looks like in one of your videos you had a network monitor / dump application running. Any chance of being able to provide a full network dump? There's definitely something weird going on at the network level and since I can't reproduce this, that may be the only way to determine what is happening.

That's qbittorrent if you are asking for the blurred part. I don't want to share my illegal activities here.

peppy commented 3 years ago

There is no "outbound queue". I'm not sure where you are drawing conclusions from, but please refrain from this and leave that to those maintaining the code.

Have you attempted reproducing when you aren't running bittorrent? This kind of software can be known to cause network issues. You may be hitting some kind of connection limit.

aticie commented 3 years ago

There is no "outbound queue". I'm not sure where you are drawing conclusions from, but please refrain from this and leave that to those maintaining the code.

ok, you can ignore my speculations. Since the game is not open-source, I can only assume things.

Have you attempted reproducing when you aren't running bittorrent? This kind of software can be known to cause network issues. You may be hitting some kind of connection limit.

Yes, but I haven't had any problems on any other games while having torrent clients open. I'm not having any packet drop or connectivity issues when playing games like Apex Legends, Dota 2, Valorant, etc.. which should require more network bandwidth than osu.

I think I relayed the issue in detail. I've already shown the reproduction steps. I don't want to try all the possible situations. Even if I close bittorrent and the connection issues resolve, the solution shouldn't be 'just close torrent lol'. It can happen to anyone with a bad internet connection. It doesn't justify being outbound connection locked for 2 minutes.

This is also a common occurrence in tournaments. Just watch some in the weekend, you will notice casters saying 'Players are having Bancho issues'.

peppy commented 3 years ago

Let's wait for other that are willing to put in the effort to chime in then.

For anyone else experiencing this: to move forward I need some lower level network logs. There's likely something fishy going on either at cloudflare's end, or at a TCP level. The timed out requests are occurring due to actual network timeouts, and we need to discern why these are occurring in the first place.

Optimally a tcpdump/wireshark dump (with ssl MITM enabled) would be appreciated. If not, analysis at your end and/or screenshots of the requests in question at a raw wire level.

aticie commented 3 years ago

By the way, the reason I don't want to continue testing this is that it's not easy for an end-user like me to test.

In order to test it, I'm asking another player to spectate me and waste their potential 1 hour or so trying to reproduce it.

If we were allowed to open up second accounts, I would fire up a client on a cloud machine and have it spectate me 7/24.

If you want to reproduce it yourself, maybe you can try using https://github.com/jagt/clumsy.

peppy commented 3 years ago

We actually test using clumsy / other network conditioners on a regular basis. That doesn't help in this scenario. The issue is that network requests are timing out at the wire. I need to find the cause for that not emulate it happening.