zopefoundation / zope.sqlalchemy

Integration of SQLAlchemy with transaction management
Other
32 stars 34 forks source link

Adds support for SQLAlchemy 2.0 and psycopg >=3 #79

Closed Daverball closed 1 year ago

Daverball commented 1 year ago

Functionally there doesn't appear to be anything that will make zope.sqlalchemy stop working on 2.0. In fact we've already had it running on 1.6 for a couple of months now with no problems.

I fixed the few tests that would've failed with SQLAlchemy 2.0 so that you can support 2.0 with confidence.

bouillon commented 1 year ago

This works also for me

icemac commented 1 year ago

@Daverball Did the tests run locally? In GitHub Actions they run for 6 hours before they get killed by the timeout.

icemac commented 1 year ago

The 1.4 tests run fine but the 2.0 ones seem to run into a deadlock.

icemac commented 1 year ago

BTW: According to the contributing policies of the zopefoundation organization you need to sign a contributor agreement before any non-trivial change can be merged. For details please consult the Contributing guidelines for zopefoundation projects.

Daverball commented 1 year ago

The tests did complete successfully for 3.10 and 3.9. I did not bother testing all of the permutations locally.

Considering coverage did complete which is essentially doing the same as the 3.9 tests from what I can tell I can't really say what is causing the deadlock on the CI. It doesn't help that it happens before any log output occurs.

Daverball commented 1 year ago

I consider this a trivial case, since only the test suite was adjusted slightly, with no changes to production code. signing such a comprehensive agreement for such a small matter is honestly not worth my time or yours.

Daverball commented 1 year ago

3.8 tests did complete successfully locally as well. Considering how the SQLAlchemy 2.0 environment is always the last one tox runs anyways I actually can't imagine that the deadlock on the CI has anything to with what I changed. Since before anything it would run the install for earliest SQLAlchemy version and it's not even getting to that point. tox locks up before it has the chance to print even just setting up the first environment.

icemac commented 1 year ago

I consider this a trivial case, since only the test suite was adjusted slightly, with no changes to production code. signing such a comprehensive agreement for such a small matter is honestly not worth my time or yours.

Alternatively for a one-time contribution you might consider to submit your changes in this PR under the ZPL license like it was done in https://github.com/zopefoundation/zope.sqlalchemy/pull/47#issuecomment-733155548

Daverball commented 1 year ago

@icemac My best guess for as to why tox is hanging only on the CI right now is that it's hitting some kind of resource limit on the CI runner, although that doesn't really explain why the py310 and py311 runners hang, since they both contain fewer environments.

I could try to double the number of version specific runners and split off a couple of environments each using --skip-env on one of the runners while specifying the skipped environments on the other, but it might make the github actions config a bit messy.

~Could you trigger a run on master to verify that the tests will still complete successfully, to rule out some sort of regression in a recent release of tox (and/or change in environment on the GH Actions CI runners)?~ (looks like that's still working fine)

Daverball commented 1 year ago

@icemac Looks like the 2.0 environment always hangs on the CI, even if it is the only one, but I'm at a complete loss as to why, since the only difference is the SQLAlchemy version pin. It's possible 2.0 is trying to build some C extensions that hang on GitHub Actions, but I would expect to see more intermediary command output if that were actually the case...

I'm also getting some doctest failures on the CI for older SQLAlchemy versions, which I cannot reproduce locally. Do you have any ideas as to why the CI environment would behave so drastically differently?

This is the workflow run with split off environments for reference: https://github.com/seantis/zope.sqlalchemy/actions/runs/4722729734/jobs/8377715445

icemac commented 1 year ago

@Daverball I would like to have added the following change to the PR to see where the tests hang – but I am not allowed to write to your branch (seems you did not allow contributor changes), so could you please do this change?:

diff --git a/.github/workflows/tests.yml b/.github/workflows/tests.yml
index 8f22689..ba688b0 100644
--- a/.github/workflows/tests.yml
+++ b/.github/workflows/tests.yml
@@ -61,7 +61,7 @@ jobs:
         pip install tox
     - name: Test
       run: |
-        tox -f ${{ matrix.config[1] }}
+        tox -f ${{ matrix.config[1] }} -- -vv
     - name: Coverage
       if: matrix.config[1] == 'coverage'
       run: |
Daverball commented 1 year ago

There wasn't really an option for me to allow contributor changes, otherwise I would have enabled it. I have it enabled on all my other pull requests. I was planning to make a new branch and play around with the verbosity options anyways. If this isn't verbose enough to identify the problem I will play around some more.

Daverball commented 1 year ago

Ok I think it might actually be GitHub Actions that's causing the output to disappear once it hangs. I followed one of the runners until it got stuck. When I switched to another runner the other run was not showing any output and when I switched back to the one I was following its output had disappeared as well. So looks like the only way to debug is to trigger the action and then follow the output. It looks like testAbortBeforeCommit hangs on the CI with SQLAlchemy 2.0 and testpg, since the first environment in the pyX factor will always be without a version pin.

After cancelling the run, the output appears to have shown up again: https://github.com/seantis/zope.sqlalchemy/actions/runs/4729467500/jobs/8392015954

Daverball commented 1 year ago

Alright, I see, I didn't look close enough at the tox.ini. I thought testpg was running locally as well. So that's the difference, if I flip the check and force testpg and testpg2 to run with a Postgres database configured locally, then I can reproduce the hang locally. It looks like the hang happens in Postgres itself, because once the test hangs and I abort and then run the tests on a different version, it will hang on the very first test until I restart Postgres.

However, in contrast to the CI, all SQLAlchemy versions will hang on the same test for me, even if I force restart Postgres. So I cannot reproduce the CI behavior for those versions. Even after removing the zope_sqlalchemy_tests database and recreating it, it will always hang on testAbortBeforeCommit (or possibly the next test after, if the print happens after the test has been run successfully)

We run zope.sqlalchemy 1.6 (which didn't have the <2 SQLAlchemy version pin yet) on a production database with Postgres and psycopg2 without issue. It's possible we're just never hitting the corner case that would trigger this hang, but that doesn't really explain the inconsistent behavior of older versions between running it locally and on the CI.

Daverball commented 1 year ago

For science I tried the new psycopg3 backend on SQLAlchemy 2.0 and it also hangs on the same test, but I am getting a little bit more of a helpful traceback when I Ctrl+C to abort the hanging tests. Looks like drop_all in tearDown is what hangs, so something about this test creates a lock on one of the tables without ever releasing it, which prevents it from being dropped, so the test just waits indefinitely for the table to be dropped in the tearDown.

I'm guessing the server side transaction never gets aborted/committed somehow.

Daverball commented 1 year ago

@icemac It looks like any test that uses more than one commit will hang, I assume the second commit is bound to the wrong server side Postgres transaction, so it never actually gets commited, which will prevent the test tearDown from doing its things. I will dive into the code to try to see if I can make this more robust somehow.

Daverball commented 1 year ago

@icemac Alright I think I figured out the problem, it wasn't the two commits, it's that in SQLAlchemy 2.0 bare engine/connects have transactions as well now without autocommit behavior, so when tests utilize the engine directly to get a new connection, that connection starts a transaction and never gets closed/commited, so we need to wrap it inside a with conn.begin().

The cleaner way to do this is probably to get a new connection in testSetUp and close it in tearDown, so the tests that need a connection outside the transaction/session machinery can just do a with self.conn.begin(): this should be backwards compatible with 1.1 and actually prevents the tests from hanging in 2.0.

Now there is one failure left in 2.0 and it is testNestedSessionCommitAllowed. Looks like for some reason session.in_nested_transaction() is False in 2.0 by the time the before_commit hook gets invoked, it does still resolve to True before session.commit is called though. So it looks like session.commit is unwinding the nested transactions before the hook gets invoked.

Daverball commented 1 year ago

@icemac Can you explain to me why session.commit is fine for a nested transaction but not for a non-nested transaction? session.commit will still commit the outermost transaction as well, so if that's what you wanted to avoid, that's not what you are doing. Generally nested transactions should make use of the context manager or call commit/ rollback etc on the savepoint itself that was returned by begin_nested.

Edit: Looking at the git blame it looks like this exception was added as a result of #1 which incorrectly assumed how the subtransaction mechanism in SQLAlchemy works. session.begin_nested(); session.commit() is not really a pattern, because it does not only commit the nested transaction, but the parent transaction as well... So I think it is fine to make that illegal again and force people to use savepoint = session.begin_nested(); savepoint.commit() in third party code that is not aware of the savepoint mechanism in transaction.

Daverball commented 1 year ago

@icemac It's probably not possible to make the examples that are being run using doctest cross compatible with both <1.4 and 2.0, without making them unreadable, I think it's probably fair to skip doctest for older versions, since the examples in the docs should reflect the most recent SQLAlchemy version.

Daverball commented 1 year ago

@icemac I took the liberty of also adding support for the psycopg3 backend while I was at it. This adds two additional test scripts, which will only be created and executed for SQLAlchemy 2.0 and test the two variations of Postgres with the new backend.

Daverball commented 1 year ago

@icemac Until the situation mentioned in https://github.com/zopefoundation/zope.sqlalchemy/pull/79#issuecomment-1516069841 is clarified I've decided to skip the failing test in 2.0. I think it's fine to make session.commit illegal in this case starting with SQLAlchemy 2.0. Personally I'd argue it should always have been illegal, but I don't know the reasoning behind this exception.

With this all tests are passing, although there are a couple of warnings. ~The LegacyAPIWarning should be fine to ignore for now. Although it would be easy to rewrite that query to not use the get method.~ (fixed) The other warnings are psycopg3 specific and are probably related to the laissez-faire way the session connections are accessed in the threading tests and won't occur in a more sane and careful example. So we could probably ignore those too, although they are a bit more concerning than the other warnings. (I haven't been able to figure out which tests trigger this warning, it's possible it's because not all the sessions in the tests get properly closed, there's a recipe for a testRunner in the SQLAlchemy docs which ensures connections/sessions are always cleaned up properly in the tearDown, but I'm not sure it's backwards compatible. But it is certainly something we can try to switch to. Either way it is not a critical warning and it's certainly not caused by zope.transaction, but rather the tests themselves)

icemac commented 1 year ago

@Daverball wrote:

Can you explain to me why session.commit is fine for a nested transaction but not for a non-nested transaction?

Sorry I am not able to explain this. I'd like to go with your suggestion to prevent it and force to use savepoint.commit() as long it is documented in the change log.

Additionally I am fine with running the doctests only against SQLAlchemy 2.0.

icemac commented 1 year ago

I've not yet been able to look into the code changes. Additionally I need a statement to resolve https://github.com/zopefoundation/zope.sqlalchemy/pull/79#issuecomment-1507958750

Daverball commented 1 year ago

I've not yet been able to look into the code changes. Additionally I need a statement to resolve #79 (comment)

Yes, I have been holding off until all the changes have been reviewed and this is ready to be merged, so there's no ambiguity about which version of the code the declaration is valid against.

Daverball commented 1 year ago

@icemac I managed to fix the ResourceWarning in psycopg v3 and i updated the changelog. I am happy with the current state of the PR now, so feel free to start your review, as soon as you find yourself with enough time to do so. Once the changes have been approved I will comment with the declaration and a create a snapshot of the PR with the internet archive, so you can merge it.

Daverball commented 1 year ago

@icemac Just pinging you again, in case this slipped through the cracks.

icemac commented 1 year ago

It seems we need packaging as new dependency. To get rid of the problems in the lint job, you can run tox -e isort-apply

Daverball commented 1 year ago

It seems we need packaging as new dependency. To get rid of the problems in the lint job, you can run tox -e isort-apply

Both of these issues have been fixed.

Daverball commented 1 year ago

@icemac I explicitly submit the contributions in this PR under the existing ZPL License.

Daverball commented 1 year ago

And here is a snapshot with the declaration on the internet archive: http://web.archive.org/web/20230524070616/https://github.com/zopefoundation/zope.sqlalchemy/pull/79

icemac commented 1 year ago

I just released https://pypi.org/project/zope.sqlalchemy/3.0/ including these changes.