Closed Pandrex247 closed 1 year ago
The check is there to verify the asynchronous-ity of the reactive calls, i.e. that the request was not blocking and the future was not done because of the synchronous blocking request. It is possible the computers are too fast these days to make the async request and the tests should be adjusted to check this in a better way.
Eh, by blocking I mean blocking the main thread since the rx calls are meant to be in separate threads.
+1 to this. I was never able to get the TCK to pass with RESTEasy on GitHub Actions because of these issues.
Open Liberty also sees failures in these tests quite often in certain test environments.
We should indeed change these tests to avoid checking doneness this way. If the goal is to verify execution in another thread (asynchronously) we should then store thread info in the call and compare it to the main thread's. In any case, the checks for done in this way should be dropped.
Given the most recent comment, what should the outcome of this challenge be? Normally, these tests would be excluded from the 3.1.x TCK and then added back into a future version when the implementation has been updated.
@brideck We will exclude those tests and fix them in a future release
Specification:
Jakarta RESTful Web Services
Challenged Test(s):
TCK Version:
3.1.1
Tested implementation:
Payara 6.2022.1.Alpha3 (RC2 tag)
Description:
When run on certain environments (e.g. an AWS instance with 2 vCPUs), a number of these tests fail sporadically due to "Future cannot be done, yet!". The failures are not consistent, implying some sort of timing issue - they happen frequently enough though that we haven't managed to get a clean run of all the tests in this class.
Looking at the way these tests work, they appear to be a bit optimistic with their timings. There's no coded delay, sleep, wait or equivalent mechanism between submitting an asynchronous task and checking if it hasn't completed yet.
Most of the tests run in a similar way, so I'll take
getWithGenericResponseTest
as an example. Line 211. This just calls a method with the same name in the parent class here: JAXRSClientIT.java (different package) Line 446This steps through like so:
future.get()
, does an assert on that the future is not done.With that established, what I see on the AWS instance is that the asynchronous get request sometimes completes and returns successfully before the call to
!future.isDone()
, meaning the test fails.Putting some log statements to print out the
System.nanoTime()
in, I get the following results locally (where the test always passes):So there's roughly 11-12 milliseconds between the task getting submitted and checking if it hasn't completed yet, with the actual asynchronous task completing within around 33-34ms - not much margin for error.
Running the same scenario against our AWS instance with only 2 vCPUs, these are the timings from one of the runs where it fails:
As you can see (if you squint closely), the asynchronous get returned within roughly 6 milliseconds before the
!future.isDone()
check happens roughly 18 milliseconds later (so 12 milliseconds after the async call has returned), meaning the test "failed" even though functionally it worked. We can even see this if we look at the logs from the test itself:Conclusion/ TLDR: the
ee.jakarta.tck.ws.rs.jaxrs21.ee.client.rxinvoker.JAXRSClientIT#checkFutureOkResponse
andee.jakarta.tck.ws.rs.jaxrs21.ee.client.rxinvoker.JAXRSClientIT#checkFutureString
methods should be adjusted since they're too optimistic, either with sort some of wait (e.g. in the resource methods themselves), or by simply dropping the!future.isDone()
since I'm not actually sure what that's meant to be testing ¯\_(ツ)_/¯