Closed dreis2211 closed 3 years ago
Maybe a stupid question, but is there the possibility to "grep" over the failed build scans on ge.spring.io to get a more complete list of flaky tests?
There is indeed and it's really useful. Here are all the test failures over the last 7 days sorted with flaky tests first.
I thought I'd stabilized the Cassandra tests with a timeout increase, but one [failed again today with a 10s timeout](https://ge.spring.io/s/53c6qrchhmhny/tests/:spring-boot-project:spring-boot-test-autoconfigure:test/org.springframework.boot.test.autoconfigure.data.cassandra.DataCassandraTestIntegrationTests/didNotInjectExampleService()#1).
Oh, that is a lovely - I was hoping that Gradle Enterprise had such a feature. Thanks for sharing that view, @wilkinsona.
From a gut feeling - and this might be wrong - most of the failures are related to some sort of timeouts, right? I wonder if the parallelism - as much as it helps - creates some more pressure on the system overall that leads to more timeouts. Given that you did an amazing job of tweaking the task caches, is this maybe something to play around with?
I think another common theme among the flaky tests is that many of them use Docker. Of the five listed above, four of them use Docker and I think parallelism could be part of the cause.
When I was working on the build migration, allowing Gradle to create one worker per core made things really unstable with many Docker-related failures. One worker per two cores seems to work well on our development machines at least. My MacBook Pro has 16 cores so I have the following in ~/.gradle/gradle.properties
:
org.gradle.workers.max=8
We configure the max workers to 4 on CI as they have, IIRC, 8 cores. We could try tuning this down, but I'd prefer not to slow everything down to avoid a problem that's at least somewhat Docker specific. I'm tempted to go through another round of timeout increases and see how it goes.
The Docker theme reminded me of something. I wonder if it would help to use the newer versions of the respective container images as well.
I saw that for almost every image there are newer (patch) versions available. (There are also newer major and minor versions available here and there, but that might be too aggressive)
Image | Current | Latest |
---|---|---|
Cassandra | 3.11.2 | 3.11.10 |
Mongo | 4.0.10 | 4.0.23 |
Redis | 4.0.6 | 4.0.14 |
Neo4J and Couchbase should be already on the latest patch versions.
Let me know if I should give this a test.
This is probably a better test failures link. It adds the CI
tag so it filters out failures on our development machines where things may be failing as we're iterating on a new feature.
Yes please, @dreis2211. Upgrading those 3 sounds like a good idea to me.
I also noticed that apparently the libraries in spring-boot-parent
didn't get a bomr
run lately. There is a testcontainers update to 1.15.2. Let me know if I should create a PR for the update or if you want to run bomr
.
I'll run Bomr on all three maintained branches.
I've made a couple of changes today related to flaky tests:
Things seem to have settled down quite a bit recently so I'll close this one now. We can take a look again in the future of we start noticing a rise in flakiness again.
CouchbaseAutoConfigurationIntegrationTests
is flaky again. I've seen it fail several times in the recent past. Reopening to look at it again.
@daschl suggests that we enable debug logging for com.couchbase
. That'll help identify why the bucket isn't ready.
@daschl also suggested the upgrade to the latest couchbase driver can help. I haven't seen one flaky test since the upgrade so I am gong to close this one again.
Hi,
both locally and on CI I encounter relatively frequent flaky tests. While I see most of the time that they're flaky and ignore them, it happens often enough that I spent/waste time on identifying if those failures are caused by my changes.
Probably not a complete list of things, but things I noticed lately:
Subjectively, the JDK 15 pipeline is a bit flakier, but that might be a false lead.
Anyhow - I wonder if we can do anything about those. I remember that you did an awesome job of increasing timeouts here and there already and tweaked the testcontainer startup attempts, but I think we're past the testcontainers stage in most of the cases mentioned above.
Cheers, Christoph