jasonrohrer / OneLife

a multiplayer survival game of parenting and civilization building
http://onehouronelife.com
Other
998 stars 280 forks source link

Merge/Stop Splitting Check on January 1st? #1018

Open DougLefelhocz opened 6 months ago

DougLefelhocz commented 6 months ago

I opened this issue on January 1st. I know, because one of the lineages was Yearby, and some private discord messages where I can see dates. You put some logs here, but not from the new year.

I poorly described the issue, and talked about splitting, and you responded by talking about how splitting works. But, the source of the issue, I suspect, came not from splitting, but what the numbers looked like during the 'merge' (equivalently 'stop splitting') period.

A while back you said:

So it will start splitting at 130, and stop splitting at 63.

Which, I think, were the settings on January 1st.

However, the data I have, linked in the first link above, suggests the merge happened too early, because the total of bs2 and s1 doesn't seem likely to have reached 63. The numbers in the second comment suggest to me another split, which suggests that unintentional ping-ponging happened, because the merge happened too early, instead of at 63 total players.

jasonrohrer commented 6 months ago

Okay, I will post the logs here from that entire period, for closer study. Just a sec.

jasonrohrer commented 6 months ago

Okay, here's the whole year of 2023, plus around the new year of 2024, entirely:

Jan 25, 2023 12:24:38 AM 260 * 0.5 <= 130, 1 servers is not enough, adding another 
Jan 25, 2023 12:24:38 AM 130 / 260 >= 0.5 for server bigserver2.onehouronelife.com, starting to spread to next server. 
Jan 25, 2023 12:20:28 PM 26 / 260 <= 0.1 for server bigserver2.onehouronelife.com, stopping spreading. 
Jan 26, 2023 08:54:02 AM 660 * 0.1 >= 66, 2 servers is too many, removing one 
Jan 26, 2023 06:32:01 PM 132 / 260 >= 0.5 for server bigserver2.onehouronelife.com, starting to spread to next server. 
Jan 26, 2023 06:32:09 PM 260 * 0.5 <= 133, 1 servers is not enough, adding another 
Jan 27, 2023 08:33:01 AM 660 * 0.1 >= 65, 2 servers is too many, removing one 
Jan 27, 2023 05:51:19 PM 260 * 0.5 <= 136, 1 servers is not enough, adding another 
Jan 28, 2023 09:29:13 AM 660 * 0.1 >= 43, 2 servers is too many, removing one 
Jan 28, 2023 02:18:57 PM 260 * 0.5 <= 130, 1 servers is not enough, adding another 
Jan 30, 2023 11:19:06 AM 26 / 260 <= 0.1 for server bigserver2.onehouronelife.com, stopping spreading. 
Jan 30, 2023 11:13:07 PM 660 * 0.1 >= 64, 2 servers is too many, removing one 
Jan 30, 2023 11:39:48 PM 260 * 0.5 <= 130, 1 servers is not enough, adding another 
Jan 31, 2023 09:11:25 AM 660 * 0.1 >= 66, 2 servers is too many, removing one 
Jan 31, 2023 04:28:49 PM 260 * 0.5 <= 130, 1 servers is not enough, adding another 
Feb 01, 2023 09:10:32 AM 660 * 0.1 >= 65, 2 servers is too many, removing one 
Feb 01, 2023 06:43:42 PM 130 / 260 >= 0.5 for server bigserver2.onehouronelife.com, starting to spread to next server. 
Feb 01, 2023 06:43:43 PM 260 * 0.5 <= 131, 1 servers is not enough, adding another 
Feb 02, 2023 09:21:55 AM 26 / 260 <= 0.1 for server bigserver2.onehouronelife.com, stopping spreading. 
Feb 02, 2023 12:06:51 PM 660 * 0.1 >= 66, 2 servers is too many, removing one 
Feb 02, 2023 07:53:47 PM 260 * 0.5 <= 130, 1 servers is not enough, adding another 
Feb 02, 2023 07:53:51 PM 130 / 260 >= 0.5 for server bigserver2.onehouronelife.com, starting to spread to next server. 
Feb 03, 2023 06:05:32 AM 26 / 260 <= 0.1 for server bigserver2.onehouronelife.com, stopping spreading. 
Feb 07, 2023 09:01:35 AM 660 * 0.1 >= 66, 2 servers is too many, removing one 
Feb 07, 2023 08:08:58 PM 260 * 0.5 <= 130, 1 servers is not enough, adding another 
Feb 08, 2023 07:02:27 AM 660 * 0.1 >= 63, 2 servers is too many, removing one 
Feb 09, 2023 08:35:01 PM 260 * 0.5 <= 130, 1 servers is not enough, adding another 
Feb 09, 2023 08:35:01 PM 130 / 260 >= 0.5 for server bigserver2.onehouronelife.com, starting to spread to next server. 
Feb 09, 2023 10:35:23 PM 660 * 0.1 >= 51, 2 servers is too many, removing one 
Feb 11, 2023 10:43:26 PM 260 * 0.5 <= 130, 1 servers is not enough, adding another 
Feb 12, 2023 09:16:56 AM 660 * 0.1 >= 66, 2 servers is too many, removing one 
Feb 12, 2023 07:07:01 PM 260 * 0.5 <= 130, 1 servers is not enough, adding another 
Feb 13, 2023 05:36:26 AM 660 * 0.1 >= 66, 2 servers is too many, removing one 
Feb 27, 2023 08:49:01 AM 26 / 260 <= 0.1 for server bigserver2.onehouronelife.com, stopping spreading. 
Mar 05, 2023 07:13:00 PM 260 * 0.5 <= 130, 1 servers is not enough, adding another 
Mar 06, 2023 02:47:37 AM 660 * 0.1 >= 65, 2 servers is too many, removing one 
Mar 12, 2023 07:04:43 PM 260 * 0.5 <= 130, 1 servers is not enough, adding another 
Mar 13, 2023 04:26:06 AM 660 * 0.1 >= 66, 2 servers is too many, removing one 
Mar 19, 2023 04:27:07 PM 130 / 260 >= 0.5 for server bigserver2.onehouronelife.com, starting to spread to next server. 
Mar 19, 2023 04:27:10 PM 260 * 0.5 <= 131, 1 servers is not enough, adding another 
Mar 20, 2023 09:07:32 AM 660 * 0.1 >= 65, 2 servers is too many, removing one 
Mar 20, 2023 05:33:56 PM 260 * 0.5 <= 130, 1 servers is not enough, adding another 
Mar 21, 2023 10:48:28 AM 660 * 0.1 >= 66, 2 servers is too many, removing one 
Mar 21, 2023 03:33:03 PM 260 * 0.5 <= 130, 1 servers is not enough, adding another 
Mar 22, 2023 12:24:23 AM 660 * 0.1 >= 63, 2 servers is too many, removing one 
Mar 22, 2023 01:20:51 AM 260 * 0.5 <= 130, 1 servers is not enough, adding another 
Mar 22, 2023 06:54:58 AM 660 * 0.1 >= 63, 2 servers is too many, removing one 
Mar 22, 2023 06:09:57 PM 260 * 0.5 <= 130, 1 servers is not enough, adding another 
Mar 23, 2023 08:03:28 AM 660 * 0.1 >= 66, 2 servers is too many, removing one 
Mar 23, 2023 06:08:30 PM 260 * 0.5 <= 130, 1 servers is not enough, adding another 
Mar 23, 2023 08:19:52 PM 26 / 260 <= 0.1 for server bigserver2.onehouronelife.com, stopping spreading. 
Mar 23, 2023 08:23:57 PM 660 * 0.1 >= 36, 2 servers is too many, removing one 
Mar 23, 2023 09:10:39 PM 260 * 0.5 <= 130, 1 servers is not enough, adding another 
Mar 23, 2023 09:10:39 PM 130 / 260 >= 0.5 for server bigserver2.onehouronelife.com, starting to spread to next server. 
Mar 24, 2023 06:33:55 AM 26 / 260 <= 0.1 for server bigserver2.onehouronelife.com, stopping spreading. 
Mar 24, 2023 06:39:25 AM 660 * 0.1 >= 66, 2 servers is too many, removing one 
Mar 24, 2023 10:33:11 PM 260 * 0.5 <= 130, 1 servers is not enough, adding another 
Mar 24, 2023 10:33:18 PM 130 / 260 >= 0.5 for server bigserver2.onehouronelife.com, starting to spread to next server. 
Mar 25, 2023 12:25:05 AM 660 * 0.1 >= 61, 2 servers is too many, removing one 
Mar 25, 2023 02:35:31 AM 260 * 0.5 <= 130, 1 servers is not enough, adding another 
Mar 25, 2023 08:47:52 AM 25 / 260 <= 0.1 for server bigserver2.onehouronelife.com, stopping spreading. 
Mar 25, 2023 08:48:42 AM 660 * 0.1 >= 63, 2 servers is too many, removing one 
Mar 25, 2023 07:56:39 PM 260 * 0.5 <= 130, 1 servers is not enough, adding another 
Mar 26, 2023 08:29:23 AM 660 * 0.1 >= 66, 2 servers is too many, removing one 
Mar 26, 2023 05:04:24 PM 260 * 0.5 <= 130, 1 servers is not enough, adding another 
Mar 27, 2023 04:54:49 AM 660 * 0.1 >= 65, 2 servers is too many, removing one 
Apr 02, 2023 07:42:59 PM 260 * 0.5 <= 130, 1 servers is not enough, adding another 
Apr 02, 2023 07:42:59 PM 130 / 260 >= 0.5 for server bigserver2.onehouronelife.com, starting to spread to next server. 
Apr 03, 2023 03:42:31 AM 25 / 260 <= 0.1 for server bigserver2.onehouronelife.com, stopping spreading. 
Apr 03, 2023 04:05:29 AM 660 * 0.1 >= 66, 2 servers is too many, removing one 
Dec 26, 2023 09:04:59 PM 130 / 260 >= 0.5 for server bigserver2.onehouronelife.com, starting to spread to next server. 
Dec 26, 2023 09:05:08 PM 260 * 0.5 <= 131, 1 servers is not enough, adding another 
Dec 27, 2023 06:12:15 AM 660 * 0.1 >= 66, 2 servers is too many, removing one 
Dec 27, 2023 07:29:07 PM 260 * 0.5 <= 130, 1 servers is not enough, adding another 
Dec 28, 2023 09:01:47 AM 23 / 260 <= 0.1 for server bigserver2.onehouronelife.com, stopping spreading. 
Dec 28, 2023 09:02:17 AM 660 * 0.1 >= 66, 2 servers is too many, removing one 
Dec 28, 2023 08:14:19 PM 260 * 0.5 <= 130, 1 servers is not enough, adding another 
Dec 29, 2023 07:40:07 AM 660 * 0.1 >= 66, 2 servers is too many, removing one 
Dec 29, 2023 07:20:30 PM 130 / 260 >= 0.5 for server bigserver2.onehouronelife.com, starting to spread to next server. 
Dec 29, 2023 07:39:32 PM 260 * 0.5 <= 130, 1 servers is not enough, adding another 
Dec 30, 2023 08:29:11 AM 660 * 0.1 >= 66, 2 servers is too many, removing one 
Dec 30, 2023 08:22:31 PM 260 * 0.5 <= 130, 1 servers is not enough, adding another 
Dec 31, 2023 07:56:13 AM 660 * 0.1 >= 66, 2 servers is too many, removing one 
Jan 01, 2024 07:57:18 PM 260 * 0.5 <= 130, 1 servers is not enough, adding another 
Jan 01, 2024 11:28:03 PM 660 * 0.1 >= 65, 2 servers is too many, removing one 
Jan 02, 2024 01:15:58 AM 260 * 0.5 <= 130, 1 servers is not enough, adding another 
Jan 02, 2024 07:45:22 AM 25 / 260 <= 0.1 for server bigserver2.onehouronelife.com, stopping spreading. 
Jan 02, 2024 07:46:12 AM 660 * 0.1 >= 64, 2 servers is too many, removing one
jasonrohrer commented 6 months ago

So it split at 130, and then unsplit later when the population of both servers together dropped below 65. This seems okay to me.

There was a 3.5 hour gap between the split and unsplit, which also seems fine.

The weird thing is that the population jumped back up to 130 just a few hours later? Actually, 1:45 later? How is that possible?

If you can investigate that, it would be great. Maybe the tally of 65 on Jan 01, 2024 11:28:03 PM is not correct?

jasonrohrer commented 6 months ago

I wonder if one serve showed up as "offline" during that time for some reason, so only one of the servers contributed to the total sum at that moment?

jasonrohrer commented 6 months ago

No, wait, the code handles the case of a server being offline...

        // never STOP spreading if some of our servers are offline
        // we don't have accurate information about player population in
        // that case.  Leave spreading in place to avoid ping-ponging
        // during temporary outages
        // When in doubt, keep spreading.
        // Note that in the case of a long-term outtage, the server should
        // be removed from remoteServerList.ini manually so that it doesn't
        // interfere with stop-spreading detection.
        else if( $totalNumOnline == $totalNumServer &&
                 $curNumServers > 1 &&
                 $activeMaxCap * $stopSpreadingFraction >= $activeCurrentPop ) {

            logMessage( "$activeMaxCap * $stopSpreadingFraction >= $activeCurrentPop, ".
                        "$curNumServers servers is too many, removing one" );
DougLefelhocz commented 6 months ago

I wonder if one serve showed up as "offline" during that time for some reason, so only one of the servers contributed to the total sum at that moment?

This seems possible. I think I read on the discord reports of people seeing more "offline" servers than usual. I sometimes see that for low population servers, but they usually come back up when I ping it again. But, what people were saying on the discord, I think a little before that, seemed different.

jasonrohrer commented 6 months ago

I'm also checking these timestamps against the lifeLogs, which show the population.

jasonrohrer commented 6 months ago

Okay, in the BS2 life log, around the time of this split:

Jan 01, 2024 07:57:18 PM 260 * 0.5 <= 130, 1 servers is not enough, adding another

I'm seeing a BS2 pop peaking around 127.

So the 130 is coming from other people playing on low-pop servers.

Then at the time of this unsplit:

Jan 01, 2024 11:28:03 PM 660 * 0.1 >= 65, 2 servers is too many, removing one

I'm seeing 65 on BS2.... and 50 on S1.

So you're right, there's some kind of bug here!

jasonrohrer commented 6 months ago

I can't figure out the bug in the reflector code after looking at it closely.

I have added some extra logging that should help in the future.

However, we'll unfortunately have to wait for another big population surge AND whatever weird thing happened to Server1 to cause the reflector not to see its population to get more info.

DougLefelhocz commented 5 months ago

Jason that will take a while, you should release all the prisoners to see if it increases the population of the server in order to look at it and get a fix.

If by "prisoners" you mean those in donkeytown, those players do get included in the servers player count on the reflector and in YumLife mod. Also, tutorial players count. The number in the upper right corner of YumLife mod can be one less than the number of players on a server, if you find the correct option (you as a player do not get included in that number).

jasonrohrer commented 2 months ago

Noticed strange behavior in logs around April 28, 2024, where we decided to add an extra server, above 1, repeatedly, without ever removing one. Not sure how the current server count was being decremented. Added more logging and code improvements here:

8a091f9a22a691f72e4632a98c145c6366d90569

0f223bef91d03138f59b8d2b277b85418fe878c9

jasonrohrer commented 2 months ago

I also did a lot of local testing with the stressTestClients, so I could change the number of players on my local test servers at will, and see how the reflector behaved. It seemed to be working correctly in all cases.... but clearly, something's still wrong based on what happened on April 28, 2024.