scylladb / python-driver

ScyllaDB Python Driver, originally DataStax Python Driver for Apache Cassandra
https://python-driver.docs.scylladb.com
Apache License 2.0
70 stars 42 forks source link

AsyncioConnection: fix initialize_reactor when called in event loop #327

Closed Lorak-mmk closed 2 months ago

Lorak-mmk commented 2 months ago

Previously, if executed within existing asyncio loop, driver would take the loop, assume it's not used and start it in a separate thread.

Additionally, if executed outside of loop, driver would create a new one and make it default for calling thread.

Those behaviors are wrong so they are changed. Now driver creates its own loop and executes it in a thread. Code that handled pid changes, which can happen when class is transferred using e.g. multiprocessing, is fixed too - previously it didn't create new thread after such transition.

fruch commented 2 months ago

@Lorak-mmk

How did you test this ?

We desperately need a test that works with asyncio, in a similar fashion as test.py

Anyhow we still shouldn't make asyncio as a fallback, and align to upstream behavior until we have a clear plan of how to test and properly validate this event loop mode

Lorak-mmk commented 2 months ago

@Lorak-mmk

How did you test this ?

Manually:

We desperately need a test that works with asyncio, in a similar fashion as test.py

I can add a test for this case, it shouldn't be too hard

Anyhow we still shouldn't make asyncio as a fallback, and align to upstream behavior until we have a clear plan of how to test and properly validate this event loop mode

I think I disagree. All integration tests passed with asyncio, it has the advantage of not requiring additional binary dependencies, and I think we are unaware of any other bugs. And regarding current bug, I predicted there will be problems with running it inside existing loop: https://github.com/scylladb/python-driver/pull/260#discussion_r1370139175 https://github.com/scylladb/python-driver/pull/271#pullrequestreview-1738692100 Unless more bugs caused by asyncio are reported I don't think it's worth it to disable it completely.

fruch commented 2 months ago

@Lorak-mmk How did you test this ?

Manually:

  • with a reproducer that directly calls initialize_reactor that was posted on the issue
  • by creating a session in async def that was passed to asyncio.run
  • by veryfying the behavior of class parameters and asyncio when passed to a different process via multiprocessing

We desperately need a test that works with asyncio, in a similar fashion as test.py

I can add a test for this case, it shouldn't be too hard

so lets try

Anyhow we still shouldn't make asyncio as a fallback, and align to upstream behavior until we have a clear plan of how to test and properly validate this event loop mode

I think I disagree. All integration tests passed with asyncio, it has the advantage of not requiring additional binary dependencies, and I think we are unaware of any other bugs.

that's exactly the problem, it's not battle proven enough yet (not by us, and not by datastax) we have similar situation with the cqlengine, that we didn't touched almost at all, it would have been easier for us if we declared it experimental.

I still think we should revert that call, and align with upstream, it too risky and it would backfire on our core unit tests first, and on user and customer right after, the integration suite doesn't really cover it enough. for example one experiment we can do, it run all of dtest with this backend, for sure we gonna find more issues.

And regarding current bug, I predicted there will be problems with running it inside existing loop: #260 (comment) #271 (review) Unless more bugs caused by asyncio are reported I don't think it's worth it to disable it completely.

Lorak-mmk commented 2 months ago

@Lorak-mmk How did you test this ?

Manually:

  • with a reproducer that directly calls initialize_reactor that was posted on the issue
  • by creating a session in async def that was passed to asyncio.run
  • by veryfying the behavior of class parameters and asyncio when passed to a different process via multiprocessing

We desperately need a test that works with asyncio, in a similar fashion as test.py

I can add a test for this case, it shouldn't be too hard

so lets try

Ok, I'll try to add some tests tomorrow.

Anyhow we still shouldn't make asyncio as a fallback, and align to upstream behavior until we have a clear plan of how to test and properly validate this event loop mode

I think I disagree. All integration tests passed with asyncio, it has the advantage of not requiring additional binary dependencies, and I think we are unaware of any other bugs.

that's exactly the problem, it's not battle proven enough yet (not by us, and not by datastax) we have similar situation with the cqlengine, that we didn't touched almost at all, it would have been easier for us if we declared it experimental.

I still think we should revert that call, and align with upstream, it too risky and it would backfire on our core unit tests first, and on user and customer right after, the integration suite doesn't really cover it enough. for example one experiment we can do, it run all of dtest with this backend, for sure we gonna find more issues.

So let's make it more battle tested and fix issues that we find. How to run dtests with asyncio backend?

And regarding current bug, I predicted there will be problems with running it inside existing loop: #260 (comment) #271 (review) Unless more bugs caused by asyncio are reported I don't think it's worth it to disable it completely.

fruch commented 2 months ago

@Lorak-mmk How did you test this ?

Manually:

  • with a reproducer that directly calls initialize_reactor that was posted on the issue
  • by creating a session in async def that was passed to asyncio.run
  • by veryfying the behavior of class parameters and asyncio when passed to a different process via multiprocessing

We desperately need a test that works with asyncio, in a similar fashion as test.py

I can add a test for this case, it shouldn't be too hard

so lets try

Ok, I'll try to add some tests tomorrow.

Anyhow we still shouldn't make asyncio as a fallback, and align to upstream behavior until we have a clear plan of how to test and properly validate this event loop mode

I think I disagree. All integration tests passed with asyncio, it has the advantage of not requiring additional binary dependencies, and I think we are unaware of any other bugs.

that's exactly the problem, it's not battle proven enough yet (not by us, and not by datastax) we have similar situation with the cqlengine, that we didn't touched almost at all, it would have been easier for us if we declared it experimental. I still think we should revert that call, and align with upstream, it too risky and it would backfire on our core unit tests first, and on user and customer right after, the integration suite doesn't really cover it enough. for example one experiment we can do, it run all of dtest with this backend, for sure we gonna find more issues.

So let's make it more battle tested and fix issues that we find. How to run dtests with asyncio backend?

patching this place, would do the trick: https://github.com/scylladb/scylla-dtest/blob/44605a072659649a5d420442b635a2fd733b1731/dtest_setup.py#L475

And regarding current bug, I predicted there will be problems with running it inside existing loop: #260 (comment) #271 (review) Unless more bugs caused by asyncio are reported I don't think it's worth it to disable it completely.

fruch commented 2 months ago

@Lorak-mmk

running gating tests with it (it's a big portion of all of the dtest) https://github.com/scylladb/scylla-dtest/pull/4400

fruch commented 2 months ago

@Lorak-mmk

we can install this PR into dtest like that:

pip install git+https://github.com/Lorak-mmk/python-driver.git@fix-asyncioconnection

and dtest pipeline can use that trick, with the following parameter

DRIVER_VERSION='git+https://github.com/Lorak-mmk/python-driver.git@fix-asyncioconnection',

I've send out a gating run with this PR:

it probably not gonna teach us anything new, since dtest isn't using asyncio on it's end.

@roydahan @yaronkaikov it would be nice if we can apply similar trick to test.py and byo pipelines i.e. passing the driver we want to use in the tests, and then we can harness dtest / test.py into the CI drivers as needed

roydahan commented 2 months ago

Sounds good to me. What's needed to be done in test.py in order to support it?

fruch commented 2 months ago

Sounds good to me. What's needed to be done in test.py in order to support it?

something along the lines of:

DRIVER_INSTALLATION_CMD=pip install git+https://github.com/Lorak-mmk/python-driver.git@fix-asyncioconnection
./dbuild bash -c "${DRIVER_INSTALLATION_CMD}; test.py ..."

and away to pass from the pipeline arguments the actual version up into the command it self (in dtest there are a few layer until it get there

I did it manually once with test.py, but it was from local repo, so I need to add a mount point. but it is how people can test fixed manually nowadays (no that anyone know exactly how)

fruch commented 2 months ago

@Lorak-mmk result from dtest, show the exactly same SSL related issue we seen before this fix.

so as expected, it didn't told us anything new about if this fix is good enough.

I think one might want to test it with test.py, if you didn't come up with build a specific test for it.

mykaul commented 2 months ago

Who owns the action item here? We need to see it solved!

Lorak-mmk commented 2 months ago

Who owns the action item here? We need to see it solved!

It's not that high priority now - issue in Scylla was solved by publishing ARM wheels, and I think we are going to mark asyncio backend as experimental anyway until https://github.com/scylladb/python-driver/issues/330 is solved. I'm writing regression test now, we should be able to merge this PR when I'm finished.