haskell-hvr / token-bucket

Haskell rate limiter library using lazy token bucket algorithm
https://hackage.haskell.org/package/token-bucket
GNU General Public License v3.0
8 stars 4 forks source link

token-bucket-0.1.0.1 test suite failure #3

Open peti opened 9 years ago

peti commented 9 years ago

Citing from http://hydra.cryp.to/build/879128/nixlog/1/raw:

Running 1 test suites...
Test suite test-tb: RUNNING...
testing tocket-bucket...
running 10+1 iterations with 10.0 Hz rate-limit...
...FAILED! (effective rate was 10.066741933749606 Hz)
Test suite test-tb: FAIL
hvr commented 9 years ago

I probably need to rethink the acceptance tolerance for the tests, as the system time precision (especially) in VMs make such tests easily cause false positives :-(

spwhitton commented 7 years ago

Hi! I packaged token-bucket for Debian. We're seeing this issue when we build token-bucket on an armel host.

If this test failure does not indicate a bug in token-bucket, could you write a patch loosening the tolerance, please? Or indicate where I should look to write one myself.

Thanks.

Kritzefitz commented 6 years ago

Hi,

I have been researching this problem and came to the conclusion that, while it is not a real bug in the library, it is also not just happening because of an imprecise clock. The machine I used to reproduce this problem reports a clock precision (clock_getres(2)) of 1e-9 which is the same as most x86 machines where this test runs fine.

Even if the clock is imprecise it seems unlikely to me that it would cause failures like this. If all you have is an imprecise clock, how can you observe that the clock is imprecise using that imprecise clock? As the clock in question is the only means by which the library and the test code can observe time, that time is, by definition, correct, no matter what it is or how fast it is moving forward.

The actual reason for this failure apperently has to do with the test code not taking the start and end times of the tests correctly and the specific time it requires the process to execute specific instructions compared to others. The interesting events in the first test run are roughly like this:

a. The token bucket is created b. The start time of the first test is taken c. The first token is allocated d. The second token is allocated e. ... f. The last token (i e. 11th) is allocated g. The end time

The test code checks that g - b is at least one second. When the token bucket is created at a, it already has one allocatable token. That token will be allocated at c. The second token, allocated at d, is available 0.1 seconds after the last time recorded in the token bucket, which is the creation time of the bucket recorded at a, assuming the code needed less than 0.1 seconds to ran from a to d, which is very likely on pretty much any hardware. In that case the time passing between a and b already contributed to the creation of the second token, but because that is happening before the start time of the test is taken at b it appears as if the token bucket has been allocating tokens too fast.

Now with that explanation it seems like the test should fail on all hardware equally, but evidently it runs fine on most hardware. This is because the time between f and g is measured by the test code as part of the test, even though the desired amount of tokens has been allocated before that. That extra time recorded by the test seems to balance out the lost time between a and b on most hardware. However, if the processor needs more time to execute the instructions between a and b than to execute those between f and g, this isn't true anymore and the test fails.

I was able to make the test pass on the hardware I care about by moving the creation of the bucket (a) closer to the start of the test (b) with this patch, but this doesn't really solve the real problem, it just shifts the timing to work out in favour of the test.

A real fix would be to create the token bucket after taking the start time of the test, but this means that the test always measures slightly more time than the token-bucket actually required to generate the requested amount of token, which might cause the test to pass in situations where it shouldn't, but I also don't see a nice way to make this work exactly, as the taking of the start time and the creation of the token bucket are always separated by at least a short amount of time.