redis / jedis

Redis Java client
MIT License
11.73k stars 3.85k forks source link

Flaky tests #2367

Open walles opened 3 years ago

walles commented 3 years ago

Expected behavior

Running JedisTest.timeoutConnectionWithURI over and over should yield the same result every time.

Actual behavior

I ran it five times from inside of IntelliJ. Attempts 1 and 2 passed, 3 failed, 4 and five passed.

Steps to reproduce:

Run JedisTest.timeoutConnectionWithURI over and over in your favorite IDE.

Redis / Jedis Configuration

Jedis version:

0ad0b4f08258af9b6c0bc7ff6d4ca1c332f12f6b

Redis version:

Redis server v=6.0.10 sha=00000000:0 malloc=libc bits=64 build=e1b2cac03875a8ff

Java version:

openjdk version "11.0.9" 2020-10-20
OpenJDK Runtime Environment (build 11.0.9+11)
OpenJDK 64-Bit Server VM (build 11.0.9+11, mixed mode)
walles commented 3 years ago

Same thing with JedisTest.timeoutConnection(), but that test I had to run about 15 times before it failed the first time.

walles commented 3 years ago

To add some structure I wrote this script and ran it on aa0158c17f72ca95eecef41173a5d2cb2d8c202f:

mvn-testalot.py 50

While that is running, you can do...

mvn-testalot.py report target/testalot

... to get an intermediate result, starts being relevant after the two first complete mvn test invocations.

Note that the script just does mvn test over and over, so the Redis backend isn't restarted between the runs as in the Makefile / CI. This may or may not be skewing the results.

Also, I disabled JedisSentinelPoolTest.returnResourceDestroysResourceOnException() after the first few runs because I felt it was too slow.

Anyway, this found the tests listed above and some others. Looking into some of these might make CI more reliable.

Flaky tests

. = pass, x = fail, E = error

Result Name
x................xx..x........x..............x.. redis.clients.jedis.tests.JedisPoolTest.testCloseConnectionOnMakeObject()
....E.EE.EEE..EEEE.E.EE.E.E.EEEEE.E.E.EEE.EE.EEE redis.clients.jedis.tests.JedisSentinelPoolTest.checkCloseableConnections()
....E.EE.EEE..EEEE.E.EE.E.E.EEEEE.E.E.EEE.EE.EEE redis.clients.jedis.tests.JedisSentinelPoolTest.checkResourceIsCloseable()
.EEEEEEEEEEEEEEEEEEEEEEEE.EEEEEEEEEEEEEEE.EEEEEE redis.clients.jedis.tests.JedisSentinelPoolTest.ensureSafeTwiceFailover()
.xxx redis.clients.jedis.tests.JedisSentinelPoolTest.returnResourceDestroysResourceOnException()
....E.EE.EEE..EEEE.E.EE.E.E.EEEEE.E.E.EEE.EE.EEE redis.clients.jedis.tests.JedisSentinelPoolTest.returnResourceShouldResetState()
.....E.EEEE.EE.E.EE.E.EE...E.E..EE.E.E.EE..EE.EE redis.clients.jedis.tests.JedisSentinelPoolWithCompleteCredentialsTest.checkCloseableConnections()
.....E.EEEE.EE.E.EE.E.EE...E.E..EE.E.E.EE..EE.EE redis.clients.jedis.tests.JedisSentinelPoolWithCompleteCredentialsTest.checkResourceIsCloseable()
.EEEEEE.E..EEEE.E.EEEE.EE.EEEEEE.EEEEEE...E.EE.. redis.clients.jedis.tests.JedisSentinelPoolWithCompleteCredentialsTest.ensureSafeTwiceFailover()
.....E.EEEE.EE.E.EE.E.EE...E.E..EE.E.E.EE..EE.EE redis.clients.jedis.tests.JedisSentinelPoolWithCompleteCredentialsTest.returnResourceShouldResetState()
.................x.............................. redis.clients.jedis.tests.JedisTest.timeoutConnection()
.................xx....................x........ redis.clients.jedis.tests.JedisTest.timeoutConnectionWithURI()
................xxxxxxxxx....................... redis.clients.jedis.tests.ShardedJedisPoolTest.shouldReturnActiveShardsWhenOneGoesOffline()
.................xxxxxx.x....................... redis.clients.jedis.tests.ShardedJedisPoolWithCompleteCredentialsTest.shouldReturnActiveShardsWhenOneGoesOffline()
...............x................................ redis.clients.jedis.tests.commands.ControlCommandsTest.clientPause()
github-actions[bot] commented 6 months ago

This issue is marked stale. It will be closed in 30 days if it is not updated.

gerzse commented 5 months ago

@walles It's been two years since you reported this. Thanks for taking the time to check. Nice automation, btw!

I ran you script on the current code base. Of course, in the meanwhile Redis has evolved, Jedis has evolved... And I did one thing differently: I made sure your script actually runs make test, so the Redis processes get stopped and started again between runs. I suspect a good deal of the errors and failures you saw were because of this.

My results, after 5 runs, are:

Flaky tests, 5 runs

. = pass, x = fail, E = error

Result Name
.x... redis.clients.jedis.commands.jedis.SlowlogCommandsTest.slowlogObjectDetails()

So no errors at least, and much less flakiness. This might be because of restarting Redis, or because the tests have improved in the past two years. I did not try to run it on the old code.

My point is: things seem to look better nowadays, and I'll try to address the flaky tests.