Correct the problem of ping computing from the server

karnute commented 6 years ago

The ping, as seen from the server, is computed from acknowledge time averaged from last 32 messages. Since origins (Quake3) the differences were computed from deltas (between sent and acked) in svs.time, but that time is only increased at each snapshot in server. The snapshot increment time is int(1000/sv_fps), as sv_fps is currently locked at 20, the resulting increment is 50ms. Nowadays, latency bellow 50ms is very common. Also the acknowledge time was saved every time the message and duplicates were read, resulting in wrong deltas. For clients with a real latency bellow 50 ms, there was always one packet (sent in the same snapshot) with delta=0 and 31 packets with delta=50, so the integer part of the average was always resulting 48 as ping. The ping computed for each client is used by the server to apply a lag compensation in the position of other players sent to each client, so the wrong ping could affect the hits. Additionally, this patch includes a change in the average of deltas to improve against sporadic sort bursts in latency. This is done by averaging frequency (inverse of delta) and then computing corresponding latency by the inverse of frequency.

karnute commented 6 years ago

Uhm, some ugly old merges appear as new commits in this pull, but only the last 2 commits have real new changes.

danielepantaleone commented 6 years ago

You should rebase your branch onto mickael9/urt and push only the commits you want to be merged

karnute commented 6 years ago

Thanks Daniele, I think that it is correct now after rebase. I always get confused with pushes from local and branches.

ThomasBrierley commented 6 years ago

For such a small diff this looks like quite a substantive fix :smiley: I feel like this might explain all of the squidgy feeling hits on UrT since the sv_fps was set to 20 ?...

From what you have discovered It would seem the sv_fps change would have made it more likely for this bug to affect people severely (stuck at 48ms) and reduced accuracy of estimates of those with ping >50ms by quantising the diffs with a larger interval. I can imagine the behaviour of the mean flipping between quite different values based on small fluctuations in latency.

I suppose using quantised diffs isn't necessarily incorrect approach statistically though provided the interval is short enough, and the window long enough for the mean to be reasonable... I wonder if the length of the window of packets (32) was chosen for that reason.

Also smoothing lag spikes with reciprocals is a nice touch.

karnute commented 6 years ago

This patch could alleviate (or not) the hits, but the ping problem occurred also with any value of sv_fps (although worst with low values), so this patch may not solve the glitches observed with higher values of sv_fps. The erroneous ping computing was there since Quake3 engine release. I am not sure but the sv_fps default in UrT could be 20 since then. Latency some years ago was higher in general, so it was not perceived as a problem to compute it with large granularity.

ThomasBrierley commented 6 years ago

Sorry I am an edit-aholic, I think you responded to my previous revision. But we seem to be thinking the same thing, you put it more succinctly:

the ping problem occurred also with any value of sv_fps (although worst with low values)

Latency some years ago was higher in general, so it was not perceived as a problem to compute it with large granularity

Anyway I feel positive about the potential impact of this patch :)

karnute commented 6 years ago

I suppose using quantised diffs isn't necessarily incorrect approach statistically though provided the interval is short enough, and the window long enough for the mean to be reasonable... I wonder if the length of the window of packets (32) was chosen for that reason.

Yes, in theory computing quantized (not too large) with a window can be approximate for high latency, but for latency sorter than steps and overwritting acknowledge times every read of msg was causing the fixed 48 ping error.

ThomasBrierley commented 6 years ago

I was thinking about this a bit more and I think there might be another factor which further fits the narrative here: As average latency has decreased over the years, variance usually reduces also, additionally higher bandwidth increases latency stability even further for shared connections...

The previous mean calculation was relying on that noise to push and pull each diff over the interval threshold at a probability corresponding with the "real" mean to within 1/32 of an interval at best. But this only works well if the variance is at least ± 1/2 of the interval (25ms and assuming unrealistic uniformity), which means the length of that 32 packet window wasn't really doing much for most modern connections because a reduced variance makes it so much more likely to be sitting in a single 50ms buckets for the whole window, resulting in a mean error closer to the worst case of 50 - t % 50ms for sv_fps=20.

This could easily produce very intermittent behaviour if your real latency is hovering around a multiple of 50, just under the boundary you get the lowest error and just above you get the worst. I can't be sure this is the reason but this explanation fits my experienced behaviour on many occasions.

But now that they are reasonably accurate diffs we can forget all that stuff :P I'm just trying to understand it is all.

karnute commented 6 years ago

(ThomasBrierley in FrozenSand/ioq3-for-UrbanTerror-4/pull/64) I put a test server together last night building off your branch and it was all working nicely, but then I discovered it was stuck in ffa mode O_o (g_gametype ineffective) which I don't think is going to attract many players.

I tested this branch patch (ping_calc) built for Linux 64 bits, with set g_gametype "3" without any problem.

ThomasBrierley commented 6 years ago

Strange, is that with the dedicated server? I was trying gametype 7 (CTF)

I found that it was advertising as CTF from the browser perspective and when connected rcon g_gametype would even return 7 but the actual mode was always stuck in ffa, it even contradictingly stated ffa in the server info dialogue... I tried latching it back and forth between different game types between restarting the map but always ffa.

I am not very experienced in administrating UrT servers so I may well be doing something wrong, however I also tried barbatos's build which does not have this issue.

I git log greped the following suspicious commits which comment on reverting previous commits due to being unable to change g_gametype and assumed it was responsible but I haven't actually attempted to bisect it properly yet so I could be wrongly attributing it to them. not that this explains why your local builds work :confused: ...

738465d

Revert my recent cvar latch changes My cvar latch system changes prevent the Game VM from changing g_gametype when the value is out of range due to it being registed in the engine. It's been pointed out as fragile method of security, which was still exploitable, by Noah Metzger (Chomenor). It doesn't seem like this is working out to be a good solution.

The issue of fs_game '..' on server being relicated on client via systeminfo exploit is still fixed as it's not affected by latch. There are a few cases from current values of fs_game are used which ideally should use fs_gamedir char array which has been validated.

Revert "Don't let VMs change engine latch cvars immediately" Partially revert "Fix fs_game '..' reading outside of home and base path" Revert "Fix VMs forcing engine latch cvar to update to latched value"

That's the last of a short run of commits by zturtleman but not all of the commits are fully reverted:

3638f69 - not fully reverted (2nd in list above) adef4e6 - not reverted 3a6af1b - not reverted

I'm really not familiar with what's going on in these so i haven't successfully been able to revert it all yet, due to subsequent changes preventing automated revert.

I should probably wait to see how you have avoided it or try a bisect before going any further.

[edit]

I've built with debian9 64bit both locally and remotely on a fresh VPS with the same results. Also built barbatos's repo under same conditions but without the issue. Also tried mickael9/urt with same issue.

karnute commented 6 years ago

About your questions related to the original wrong ping computing, as I said above for latency less than 50ms it was stuck in 48 ping. Curiously, it could be less than 48 in some very pathological cases with a lot of lost packets (that will not be counted in the 32 pool) and successive last acks of more than one previous message were arriving with much delay in the same snapshot. This problem is partially solved in my patch by the "if" inserted in line 1497 of code/server/sv_client.c, that avoids other acks after the first to overwrite the real response time (only with that correction, pings below 50ms would show as exact 0 almost always, if the server cpu is not overloaded).

For any latency greater than 50ms, the variance around mean value was not very important. What previous code measured was only the mean probability of every packet last ack arriving in one or other snapshot, and that probability is proportional to the multiple of latency respect to 50ms (think on the integer part of the sum of 32 integers times 50/32). So the average ping value was more or less accurate except for small increase in variance due to the 50/32=1,56 factor increase for a variation of every 1ms (something like if one real variation were +2ms the ping would change +3ms). Also, pathological cases could arise if duplicated packets were traveling by different routes with very different latency (this is avoided again by the "if" in patch). The substitution of svs.time by Sys_Milliseconds() solves the overall inaccuracy and at the same time (and more important) it solves the problem for real latency below 50ms (or any other snapshot time from sv_fps).

Finally, using an average of time intervals has the problem of the great influence of sort bursts of delays in networks (large spikes). To address this problem, it is usual to average the frequency instead of intervals, and convert average frequency into latency again by the inverse.

There is a marginally residual of error in the network latency estimation. It is the average time spent by a busy client to acknowledge a message. Although, this is not relevant for UrT, as that time will be what the client is really taking to "see" messages from server snapshot. The client is only listening to network messages during the free time in each frame. If the client spend almost all frame time (= int(1000/com_maxfps), usually 8ms) to render the frame, then it will only listen network in one short point every frame. So, usually a busy client will add an average of 4ms. to its response time, and that number is exactly what I found in my localhost tests (dedicated server) with simulated latency.

karnute commented 6 years ago

About your problem with the server in ffa mode. To be sure, maybe you could download directly the source of my branch with the patch from https://github.com/karnute/ioq3/archive/ping_calc.zip and compile it in your system.

ThomasBrierley commented 6 years ago

What previous code measured was only the mean probability of every packet last ack arriving in one or other snapshot.

Ah ok i get it, I was operating on a misunderstanding of what that window of packets contained.

Finally, using an average of time intervals has the problem of the great influence of sort bursts of delays in networks (large spikes). To address this problem, it is usual to average the frequency instead of intervals, and convert average frequency into latency again by the inverse.

Yeah I really like this simple little method, reciprocal or reciprocals seems to pop up all over the place, it's also how electrical impedance is calculated.

About your problem with the server in ffa mode. To be sure, maybe you could download directly the source of my branch with the patch from https://github.com/karnute/ioq3/archive/ping_calc.zip and compile it in your system.

Yup that's what I did, I cloned your repo directly and checked out the ping_calc branch. but as I said I could also reproduce the issue in mickael9's repo in the urt branch.

When you tested your server binary did you go beyond the browser and actually try to play the game type? as I found it advertised correctly but actually ran ffa.

karnute commented 6 years ago

In my tests, I run dedicated server with a cfg that changes g_gametype and then do a devmap. The client is connected to localhost directly (not through browser).

Are you sure the server you are connecting is the test one? or is it other in the same IP (maybe launched previously and already running)?

ThomasBrierley commented 6 years ago

Yup i'm sure and I did the same test locally with g_gametype set in the q3ut/server.cfg with the exec server.cfg command line option: Running one dedicated server (in local mode) and connecting to 0.0.0.0. But the problem is that it's clearly registering the gametype somehow but is stuck in ffa anyway (since rcon g_gametype will return 7)

Would it be possible to upload your binaries so I can see if it's a config or environment difference vs a build difference?, I will be back online in about an hour.

karnute commented 6 years ago

Yes, I uploaded my binaries to https://goo.gl/AUQ4U2 they are for Linux 64 bits. Just tested it with g_gametype=7 in devmap Riyadh without problem. The bots were going to defend their flag and to capture mine ;-) My ping was under 4 as expected.

ThomasBrierley commented 6 years ago

Our builds were identical... so I reverted to the default server.cfg which worked, and bisected to find... something... weird: #30 :laughing:

Sorry for the digression, I'll try put the server up again and be less greedy with the naming :smile:

ThomasBrierley commented 6 years ago

Thanks for your help with that.

I've never spent so long trying to name a server (that bug is very picky). Anyway it's a UK server, I called it "SMOOTHY CTF" under the guise of providing smoother gameplay :) with a mostly generic setup, but I increased max players to 32 to see how it handles it (wishful thinking perhaps, maybe I should add a post on the forums to drum up some test subjects).

[edit]

:disappointed: I turned it off again because that bug occurs again depending on the map.

ThomasBrierley commented 6 years ago

I put it up again with a new build using both your and my patches. Not getting any infostring warnings anymore so I'll leave it running:

http://urt.so/s/728343288

danielepantaleone commented 6 years ago

This looks good to me. I haven't tested it personally so I would wait a bit before merging. Let me know if the testing report any issue.

KroniK907 commented 6 years ago

@ThomasBrierley can you post your build? Assuming it's a linux x64 build.

ThomasBrierley commented 6 years ago

http://139.162.251.183/renderer_opengl1_x86_64.so MD5: 255436cb9041593add32eb53cba8d193 http://139.162.251.183/renderer_opengl2_x86_64.so MD5: 5411a2d052fa2c0b4dbca2dae1afecd6 http://139.162.251.183/urbanterror-m9.x86_64 MD5: bc4a012627e8192566f9bbb508a5c9d1 http://139.162.251.183/urbanterror-server-m9.x86_64 MD5: 91026c74e3c4d7ae213aa7eaad8bebc7

Those aren't permanent but I will keep them there for at least a few days (they may be overwritten).

Note in above build I shortened the version cvar even more than in that patch since it still wasn't enough, I removed the PLATFORM_NAME and PRODUCT_DATE string

ThomasBrierley commented 6 years ago

I've only managed to get to a max of 3v3 on my public server so far, but it's always been smooth with no complaints so far, I also had a fairly equal 1v1 match against someone with a ~300ms ping. I've never seen such a smooth netgraph, for pings less than 48ms those triangles start turning into pixels :D

I think to get better testing (i.e higher numbers) we'd need the help of some existing popular servers, mine is just too new and obscure. Let us know how your testing went @KroniK907

KroniK907 commented 6 years ago

@ThomasBrierley I just got it running on my server. My servers are jump servers though, so they wont be much better than yours.

KroniK907 commented 6 years ago

Is this ever going to be merged or did this not turn out to be a useful fix?

mickael9 / ioq3

Correct the problem of ping computing from the server #29