future: cass future wait timed

muzarski commented 4 months ago

ref: https://github.com/scylladb/cpp-rust-driver/issues/42

This PR implements a cass_future_wait_timed API function which allows to wait for a future to resolve within a given period of time.

Pre-review checklist

[x] I have split my patch into logically separate commits.
[x] All commit messages clearly explain what they change and why.
[x] PR description sums up the changes and reasons why they should be introduced.
[x] I have enabled appropriate tests in .github/workflows/build.yml in gtest_filter.
[x] I have enabled appropriate tests in .github/workflows/cassandra.yml in gtest_filter.

muzarski commented 4 months ago

Rebased on master.

muzarski commented 4 months ago

The async test I enabled started to fail (even when I only rebased on master). I can't reproduce it locally:

for scylla tests, my local version of ccm for some reason can't create a 5.0.0 scylla cluster
for cassandra - the test passes

I'll update an integration test to make use of cass_future_wait instead of cass_future_wait_timed to see if the issue is with the timed version itself, or not.

muzarski commented 4 months ago

I changed wait_timed to wait in integration test. There are multiple client timeout errors:

 /home/runner/work/cpp-rust-driver/cpp-rust-driver/tests/src/integration/objects/future.hpp:152: Failure
      Expected: CASS_OK
To be equal to: wait_code
      Which is: CASS_ERROR_LIB_REQUEST_TIMED_OUT [Request timed out]
Request timed out: Request timeout: Request took longer than 12000ms: deadline has elapsed

Link to the job: https://github.com/scylladb/cpp-rust-driver/actions/runs/10201719963/job/28224244570?pr=153#step:7:8477.

Recently, we merged a PR that sets a default client timeout to 12000ms (which is huge!). I'm not sure why the timeouts appear - these are simple insert queries. Link to the test case: https://github.com/scylladb/cpp-rust-driver/blob/master/tests/src/integration/tests/test_async.cpp#L99-L111.

I'll try to investigate further, but it seems that the test failure is not related to cass_future_wait_timed. The test fails due to client request timeouts. As I mentioned previously, the test passes for me locally. Maybe it's some GH actions issue - query timing out after 12s is really weird.

cc: @Lorak-mmk @wprzytula

Lorak-mmk commented 3 months ago

The async test I enabled started to fail (even when I only rebased on master). I can't reproduce it locally:
* for scylla tests, my local version of ccm for some reason can't create a 5.0.0 scylla cluster

What is the error? also we should update to newer version / test with more than 1 version.

* for cassandra - the test passes
I'll update an integration test to make use of cass_future_wait instead of cass_future_wait_timed to see if the issue is with the timed version itself, or not.

Lorak-mmk commented 3 months ago

I changed wait_timed to wait in integration test. There are multiple client timeout errors:
 /home/runner/work/cpp-rust-driver/cpp-rust-driver/tests/src/integration/objects/future.hpp:152: Failure
      Expected: CASS_OK
To be equal to: wait_code
      Which is: CASS_ERROR_LIB_REQUEST_TIMED_OUT [Request timed out]
Request timed out: Request timeout: Request took longer than 12000ms: deadline has elapsed
Link to the job: https://github.com/scylladb/cpp-rust-driver/actions/runs/10201719963/job/28224244570?pr=153#step:7:8477.

Recently, we merged a PR that sets a default client timeout to 12000ms (which is huge!). I'm not sure why the timeouts appear - these are simple insert queries. Link to the test case: https://github.com/scylladb/cpp-rust-driver/blob/master/tests/src/integration/tests/test_async.cpp#L99-L111.

I'll try to investigate further, but it seems that the test failure is not related to cass_future_wait_timed. The test fails due to client request timeouts. As I mentioned previously, the test passes for me locally. Maybe it's some GH actions issue - query timing out after 12s is really weird.

cc: @Lorak-mmk @wprzytula

So previously there was no client request timeout at all? And now that you introduced it, queries are timing out? Tests are running in valgrind, so it's possible for queries to take a longer time, but it shouldn't be this long. When running locally, do you also use valgrind?

muzarski commented 3 months ago

So previously there was no client request timeout at all? And now that you introduced it, queries are timing out? Tests are running in valgrind, so it's possible for queries to take a longer time, but it shouldn't be this long. When running locally, do you also use valgrind?

Good thing that you noticed that. I missed it. Indeed, when I run the tests locally under valgrind, the timeouts appear.

muzarski commented 3 months ago

The async test I enabled started to fail (even when I only rebased on master). I can't reproduce it locally:
* for scylla tests, my local version of ccm for some reason can't create a 5.0.0 scylla cluster
What is the error?

Nvm, it's not ccm issue. I was just missing openjdk-11 locally which is needed to setup 5.0.0 cluster apparently.

also we should update to newer version / test with more than 1 version.

What version would you suggest? 5.4.8?

Lorak-mmk commented 3 months ago

The async test I enabled started to fail (even when I only rebased on master). I can't reproduce it locally:
* for scylla tests, my local version of ccm for some reason can't create a 5.0.0 scylla cluster
What is the error?
Nvm, it's not ccm issue. I was just missing openjdk-11 locally which is needed to setup 5.0.0 cluster apparently.

I recommend using ccm by nix, it removed a lot of my frustrations with ccm.

also we should update to newer version / test with more than 1 version.

What version would you suggest? 5.4.8?

6.0

muzarski commented 3 months ago

v2: addressed review comments

There are still timeouts appearing. The conclusion is that the timeouts appear due to tests being run under valgrind (reproducing locally as well). I see two solutions here:

Run the timeout-sensitive tests in a separate CI step without valgrind (right now, there is only one failing test due to timeouts).
Modify the failing test so it disables the global client timeout, and waits for the future indefinitely (change wait_timed to wait).

I think the former approach is better, as it allows us to test cass_future_wait_timed. WDYT? @Lorak-mmk @wprzytula

Lorak-mmk commented 3 months ago

v2: addressed review comments

There are still timeouts appearing. The conclusion is that the timeouts appear due to tests being run under valgrind (reproducing locally as well). I see two solutions here:
* Run the timeout-sensitive tests in a separate CI step without `valgrind` (right now, there is only one failing test due to timeouts).

* Modify the failing test so it disables the global client timeout, and waits for the future indefinitely (change `wait_timed` to `wait`).
I think the former approach is better, as it allows us to test cass_future_wait_timed. WDYT? @Lorak-mmk @wprzytula

In second approach, if you use wait instead of wait_timed, then how does it test wait_timed?

muzarski commented 3 months ago

v2: addressed review comments There are still timeouts appearing. The conclusion is that the timeouts appear due to tests being run under valgrind (reproducing locally as well). I see two solutions here:
* Run the timeout-sensitive tests in a separate CI step without `valgrind` (right now, there is only one failing test due to timeouts).

* Modify the failing test so it disables the global client timeout, and waits for the future indefinitely (change `wait_timed` to `wait`).
I think the former approach is better, as it allows us to test cass_future_wait_timed. WDYT? @Lorak-mmk @wprzytula
In second approach, if you use wait instead of wait_timed, then how does it test wait_timed?

That's the point - it does not. That's why I prefer to run this specific test without valgrind.

Lorak-mmk commented 3 months ago

Sorry, I misread your message. Separate step complicates the CI, but in this case it seems reasonable to me.

muzarski commented 3 months ago

v3: running the failing async test in a separate CI step without valgrind

I'll update Scylla version in CI in a separate PR.

muzarski commented 3 months ago

v3.1: deduplicate error message variable in callback test

muzarski commented 3 months ago

v3.2: fix CI step description for timeout-sensitive cassandra integration tests

muzarski commented 3 months ago

Rebased on master

muzarski commented 3 months ago

Rebased on master

muzarski commented 3 months ago

v4: added proper synchronization between cass_future_wait and cass_future_wait_timed callers.

muzarski commented 2 months ago

v4.1:

rebased on master -> adjusted to new CI
addressed @wprzytula comments

muzarski commented 1 month ago

v4.2:

rebased on master
adjusted commit message that @Lorak-mmk mentioned

scylladb / cpp-rust-driver

future: cass future wait timed #147

Pre-review checklist