Open ikolomi opened 1 month ago
glide.cluster.CommandTests.fcall_readonly_function(): FAILURE 0.212s
CommandTests > fcall readonly function FAILED
org.opentest4j.AssertionFailedError: expected: <true> but was: <false>
at app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
at app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
at app//org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63)
at app//org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36)
at app//org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:31)
at app//org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:183)
at app//glide.cluster.CommandTests.fcall_readonly_function(CommandTests.java:1672)
glide.cluster.CommandTests.fcall_readonly_function
fixed in #2350
GLIDE code rust tests are flaky: https://github.com/valkey-io/valkey-glide/actions/runs/11297525637/job/31424597153?pr=2439
standalone_client_tests::test_read_from_replica_round_robin_do_not_read_from_disconnected_replica
Node tests timeout from time to time, failure points changes.
https://github.com/valkey-io/valkey-glide/actions/runs/11477149214/job/31938639591?pr=2500
Java CI
This step has been truncated due to its large size. Download the full logs from the menu once the workflow run has completed.
GLIDE code rust tests are flaky: https://github.com/valkey-io/valkey-glide/actions/runs/11297525637/job/31424597153?pr=2439
standalone_client_tests::test_read_from_replica_round_robin_do_not_read_from_disconnected_replica
Fixed in one of my open PRs
https://github.com/valkey-io/valkey-glide/actions/runs/11612733982/job/32336884037?pr=2558
Scripts tests timing out in node. There's at least 1500 sleep time as part of the tests. It's enough that the machine is a bit under load and the tests run slow or wait, and it fails. Need to have higher timeouts for tests with long sleeps, plus a loop with 4000 timeout limit — if the test fails, the error is timeout from the framework and from the test. Test timeout need to be higher, or less sleep during test.
https://github.com/valkey-io/valkey-glide/actions/runs/11612834623/job/32337281537
Occasionally, tests on self-hosted getError: File was unable to be removed Error: EACCES: permission denied, unlink ...
Probably interfering tests on the host.
https://github.com/valkey-io/valkey-glide/actions/runs/11643552677/job/32424432651#step:5:1010
https://github.com/valkey-io/valkey-glide/actions/runs/11643552677/job/32424430639#step:5:1062
Tests with wait
command in node always flaky, it appears that the pace of the run affects the result and sometimes there's a command that arrives to one of the replicas, sometimes not. In multi
the client doesn't wait for a response on wait
command, it's returned immediately.
Since node is one process it makes sense that sometime, there's a command arriving to a replica right before wait
and occasionally the pipeline is already clean, especially when running in a very slow env.
These command need to be removed from transaction tests.
Generally — as can be seen in the full run, as results from the minimal testing we had until this point, we're seeing many failures. Some are results of flakiness, and some are simply bugs that weren't tested. @ikolomi I think we might need to freeze for one day and divide the suites between the team, and stop the run until we know we are green. I don't think releasing while we are red is something we want to accept. The decision you'll take now, is what every RM will do, you are the first, and you'll make the tone. Please consider it.
Describe the bug
More than often CI tests require rerunning in order to pass. In addition, warnings can be seen during the tests. This situation is unhealthy, building a sense of uncertainty regarding the code quality. We need an owner for the CI testing who will be able to maintain and make this system flourish
Expected Behavior
All tests are passing in the CI, no unexplained warning in the logs.
Current Behavior
Test are flaky, requiring reruns, unexplained logs and warnings during the tests
Reproduction Steps
Happens during PR checks
Possible Solution
No response
Additional Information/Context
No response
Client version used
Dont care
Engine type and version
Dont care
OS
Dont care
Language
TypeScript
Language Version
Dont care
Cluster information
No response
Logs
No response
Other information
No response