triplea-game / triplea

TripleA is a turn based strategy game and board game engine, similar to Axis & Allies or Risk.
https://triplea-game.org/
GNU General Public License v3.0
1.35k stars 399 forks source link

Disconnected from Pre Release lobby #5637

Closed beelee1 closed 4 years ago

beelee1 commented 4 years ago

joined the lobby on pre 16920 and after 25 minutes it logged me out. It happens every time ( 3 ) i've joined. Don't know if it matters but I was the only one on. Don't remember that happening on any of the other releases. Was never on them solo either though.

Anyway, just something I noticed

Engine Version 19620

RoiEXLab commented 4 years ago

Is it exactly 25 minutes or just round about?

@DanVanAtta we should have logging for this on prerelease, right? So that should tell us the servers perspective of what happened.

DanVanAtta commented 4 years ago

An observation:

I suspect the problem is due to prerelease server deployments, the following are pretty close:

Dec 01 01:53:54 li999-232 run_server[32401]: INFO [2019-12-01 01:53:54,461] org.triplea.server.lobby.chat.MessageBroadcaster: Broadcasting: ServerMessageEnvelope(messageType=PLAYER_JOINED, payload={"chatParticipant":{"playerName":{"value":"beelee"},"playerChatId":{"value":"7198c8a4-2ba3-480d-9da0-e751f743a782"},"isModerator":false}}) Dec 01 02:29:41 li999-232 systemd[1]: Started A high performance web server and a reverse proxy server.

sudo journalctl  --since "2019-11-28 02:00:00" | grep -iE "PLAYER_JOINED.*beelee|Started a high performance"

Without an exact timestamp we're not going to know the problem. We probably should have a user connect to the prerelease lobby and see how long they are connected. I think NGINX restart case is most likely, it could be a rate limit error is killing the keep-alive thread. A screenshot of the exact error would be really helpful as the lobby can potentially disconnect in different ways.

prastle commented 4 years ago

@DanVanAtta Let me know exactly what to test if you need me to. Since I have 1 hosting IP and a high speed sep isp.

DanVanAtta commented 4 years ago

@prastle essentially just join the prerelease lobby, watch for disconnects. See if there are any prerelease deployments around that time if disconnected. The prerelease deployment is about 20 minutes after anything is merged.

Cernelius commented 4 years ago

Based on my experienced, maybe this happens if you are both in the 1.9 and 2.0 lobby, at the same time, but not if you are only in the 2.0 one. So, maybe keep an eye on conflicts between the two.

prastle commented 4 years ago

Conflicts between the two is a null point Cer. Since the next stable will be standalone. @Cernelius

beelee1 commented 4 years ago

Is it exactly 25 minutes or just round about?

@RoiEXLab I only timed it 0nce and was 25 minutes an 32 seconds give or take a few seconds. I was gonna time it again but spaced out the timer. It seems to be consistent time wise just guessing though.

I got this error last time:

Screenshot from 2019-12-01 22-53-29

I have had it drop when it is the only triplea open as well

DanVanAtta commented 4 years ago

Do you have multiple windows open to prerelease @beelee1 ?

beelee1 commented 4 years ago

yea so that error is happening within a couple minutes. Just saw it today for the first time. It's hard to log out as well as it keeps popping up. Gotta be quick on the click :)

beelee1 commented 4 years ago

@DanVanAtta no just the one

DanVanAtta commented 4 years ago

http 429 is rate limiting. The 'fetch-games' endpoint is 5/s

In the logs, we can see that:

Dec 02 04:37:34 li999-232 run_server[32401]: ##.##.##.### - - [02/Dec/2019:04:37:34 +0000] "GET /lobby/games/fetch-games HTTP/1.0" 200 313 "-" "Java/11.0.4" 1
Dec 02 04:37:34 li999-232 run_server[32401]: ##.##.##.### - - [02/Dec/2019:04:37:34 +0000] "GET /lobby/games/fetch-games HTTP/1.0" 200 313 "-" "Java/11.0.4" 1
Dec 02 04:37:34 li999-232 run_server[32401]: ##.##.##.### - - [02/Dec/2019:04:37:34 +0000] "GET /lobby/games/fetch-games HTTP/1.0" 200 313 "-" "Java/11.0.4" 1
Dec 02 04:37:34 li999-232 run_server[32401]: ##.##.##.### - - [02/Dec/2019:04:37:34 +0000] "GET /lobby/games/fetch-games HTTP/1.0" 200 313 "-" "Java/11.0.4" 0
Dec 02 04:37:34 li999-232 run_server[32401]: ##.##.##.### - - [02/Dec/2019:04:37:34 +0000] "GET /lobby/games/fetch-games HTTP/1.0" 200 313 "-" "Java/11.0.4" 1
Dec 02 04:37:34 li999-232 run_server[32401]: INFO  [2019-12-02 04:37:34,357] es.moki.ratelimij.dropwizard.filter.RateLimit429EnforcerFilter: rate-limit key 'rlj:xfwd4###.##.##.###' over limit. HTTP Status 429 returned.

In theory there is only supposed to be one request to that URL every 3 seconds.

DanVanAtta commented 4 years ago

Some more log data (ip address is redacted with '##.##.##.###').

We can see there are 'bursts' to the fetch games listing, below are counts of requests to /fetch-games per timestamp:

admin@li999-232:~$ sudo journalctl -u http_server -n 1000 | grep "##.##.##.###"  | grep "fetch-games" | sed 's/^Dec 02 \(..:..:..\).*/\1/' | sort | uniq -c | sed 's/^\s*//' | grep -v "1 " | sort -n
2 04:31:26
2 04:53:59
2 05:06:26
2 05:06:29
3 04:31:27
3 04:53:44
3 05:06:30
4 04:53:53
6 04:37:34
6 04:53:35
6 04:53:45
6 04:53:47
6 04:53:54
6 05:04:59
6 05:06:16
6 05:06:23
6 05:06:27

Below is sample of any request, we can see the request cadence is 3 seconds:

admin@li999-232:~$ sudo journalctl -u http_server -n 1000 | grep "##.##.##.###" | grep "fetch-games" | sed 's/^Dec 02 \(..:..:..\).*/\1/' 
04:29:10
04:29:13
04:29:16
04:29:19
04:29:22
04:29:25
04:29:28
04:29:31
04:29:34
04:29:38
04:29:40
04:29:43
04:29:46
04:29:49
04:29:52
04:29:55
04:29:59
04:30:01
04:30:04
04:30:07
04:30:10
04:30:13
04:30:17
04:30:19
04:30:22
04:30:25
04:30:28
04:30:31
04:30:34
04:30:37
04:30:40
04:30:43
04:30:46
DanVanAtta commented 4 years ago

Also see this:

Dec 02 05:06:30 li999-232 run_server[32401]: INFO  [2019-12-02 05:06:30,500] org.triplea.server.lobby.chat.ChatSocketController: IP disconnected: /127.0.0.1, CloseReason[1000]
Dec 02 05:06:30 li999-232 run_server[32401]: INFO  [2019-12-02 05:06:30,500] org.triplea.server.lobby.chat.MessageBroadcaster: Broadcasting: ServerMessageEnvelope(messageType=PLAYER_LEFT, payload={"playerName":{"value":"beelee"}})

Couple things to do and/or investigate:

But neither of those are related.

@beelee1 are you able to see 'INFO' logging in the console? To open the console: 'engine preferences' > 'Game' > 'show console' = true

Then verify that 'Normal logging' is set. If you can get the error message, it'd be great to dump your console contents to a gist: https://gist.github.com/

That would give us a point of view of what is happening on the client. It might not be enough info, but it'll be something.

beelee1 commented 4 years ago

@DanVanAtta I may have misunderstood your question

Do you have multiple windows open to prerelease @beelee1 ?

I turned off the internet and started triplea and haven't got the error or been logged out after 30 minutes. I'll try the console thingy

beelee1 commented 4 years ago

as soon as i turned on the internet i got the error within a couple minutes. I'll try again with the console on

DanVanAtta commented 4 years ago

Thanks. My question was more, are you connecting multiple times?

beelee1 commented 4 years ago

well I've started it up ( 16920 ) 6 or 7 times now if that's what you mean by connecting. The console didn't show anything:

Screenshot from 2019-12-01 23-55-55

Also it says I don't have a "Downloaded maps" folder for the preRelease and i can' DL maps. That's probably a separate thing idk. I'm gonna reDL thr preRelease and see what happens

DanVanAtta commented 4 years ago

To fix, plan is to convert the fetchGameListing API call to websocket. That will help ensure we do not run into any rate limits (will fix up noise in the logs), and could be better scaling-wise as updates will be as-needed (and also nearly instant as well instead of up to a 3 second delay)

DanVanAtta commented 4 years ago

This will be fixed with: https://github.com/triplea-game/triplea/pull/5763

DanVanAtta commented 4 years ago

https://github.com/triplea-game/triplea/pull/5763 is merged, really appreciate the testing and error report.