Closed Daverball closed 1 year ago
This works also for me
@Daverball Did the tests run locally? In GitHub Actions they run for 6 hours before they get killed by the timeout.
The 1.4 tests run fine but the 2.0 ones seem to run into a deadlock.
BTW: According to the contributing policies of the zopefoundation organization you need to sign a contributor agreement before any non-trivial change can be merged. For details please consult the Contributing guidelines for zopefoundation projects.
The tests did complete successfully for 3.10 and 3.9. I did not bother testing all of the permutations locally.
Considering coverage did complete which is essentially doing the same as the 3.9 tests from what I can tell I can't really say what is causing the deadlock on the CI. It doesn't help that it happens before any log output occurs.
I consider this a trivial case, since only the test suite was adjusted slightly, with no changes to production code. signing such a comprehensive agreement for such a small matter is honestly not worth my time or yours.
3.8 tests did complete successfully locally as well. Considering how the SQLAlchemy 2.0 environment is always the last one tox runs anyways I actually can't imagine that the deadlock on the CI has anything to with what I changed. Since before anything it would run the install for earliest SQLAlchemy version and it's not even getting to that point. tox locks up before it has the chance to print even just setting up the first environment.
I consider this a trivial case, since only the test suite was adjusted slightly, with no changes to production code. signing such a comprehensive agreement for such a small matter is honestly not worth my time or yours.
Alternatively for a one-time contribution you might consider to submit your changes in this PR under the ZPL license like it was done in https://github.com/zopefoundation/zope.sqlalchemy/pull/47#issuecomment-733155548
@icemac My best guess for as to why tox is hanging only on the CI right now is that it's hitting some kind of resource limit on the CI runner, although that doesn't really explain why the py310 and py311 runners hang, since they both contain fewer environments.
I could try to double the number of version specific runners and split off a couple of environments each using --skip-env
on one of the runners while specifying the skipped environments on the other, but it might make the github actions config a bit messy.
~Could you trigger a run on master to verify that the tests will still complete successfully, to rule out some sort of regression in a recent release of tox (and/or change in environment on the GH Actions CI runners)?~ (looks like that's still working fine)
@icemac Looks like the 2.0 environment always hangs on the CI, even if it is the only one, but I'm at a complete loss as to why, since the only difference is the SQLAlchemy version pin. It's possible 2.0 is trying to build some C extensions that hang on GitHub Actions, but I would expect to see more intermediary command output if that were actually the case...
I'm also getting some doctest failures on the CI for older SQLAlchemy versions, which I cannot reproduce locally. Do you have any ideas as to why the CI environment would behave so drastically differently?
This is the workflow run with split off environments for reference: https://github.com/seantis/zope.sqlalchemy/actions/runs/4722729734/jobs/8377715445
@Daverball I would like to have added the following change to the PR to see where the tests hang – but I am not allowed to write to your branch (seems you did not allow contributor changes), so could you please do this change?:
diff --git a/.github/workflows/tests.yml b/.github/workflows/tests.yml
index 8f22689..ba688b0 100644
--- a/.github/workflows/tests.yml
+++ b/.github/workflows/tests.yml
@@ -61,7 +61,7 @@ jobs:
pip install tox
- name: Test
run: |
- tox -f ${{ matrix.config[1] }}
+ tox -f ${{ matrix.config[1] }} -- -vv
- name: Coverage
if: matrix.config[1] == 'coverage'
run: |
There wasn't really an option for me to allow contributor changes, otherwise I would have enabled it. I have it enabled on all my other pull requests. I was planning to make a new branch and play around with the verbosity options anyways. If this isn't verbose enough to identify the problem I will play around some more.
Ok I think it might actually be GitHub Actions that's causing the output to disappear once it hangs. I followed one of the runners until it got stuck. When I switched to another runner the other run was not showing any output and when I switched back to the one I was following its output had disappeared as well. So looks like the only way to debug is to trigger the action and then follow the output. It looks like testAbortBeforeCommit
hangs on the CI with SQLAlchemy 2.0 and testpg
, since the first environment in the pyX factor will always be without a version pin.
After cancelling the run, the output appears to have shown up again: https://github.com/seantis/zope.sqlalchemy/actions/runs/4729467500/jobs/8392015954
Alright, I see, I didn't look close enough at the tox.ini
. I thought testpg
was running locally as well. So that's the difference, if I flip the check and force testpg
and testpg2
to run with a Postgres database configured locally, then I can reproduce the hang locally. It looks like the hang happens in Postgres itself, because once the test hangs and I abort and then run the tests on a different version, it will hang on the very first test until I restart Postgres.
However, in contrast to the CI, all SQLAlchemy versions will hang on the same test for me, even if I force restart Postgres. So I cannot reproduce the CI behavior for those versions. Even after removing the zope_sqlalchemy_tests
database and recreating it, it will always hang on testAbortBeforeCommit
(or possibly the next test after, if the print happens after the test has been run successfully)
We run zope.sqlalchemy 1.6 (which didn't have the <2 SQLAlchemy version pin yet) on a production database with Postgres and psycopg2 without issue. It's possible we're just never hitting the corner case that would trigger this hang, but that doesn't really explain the inconsistent behavior of older versions between running it locally and on the CI.
For science I tried the new psycopg3 backend on SQLAlchemy 2.0 and it also hangs on the same test, but I am getting a little bit more of a helpful traceback when I Ctrl+C
to abort the hanging tests. Looks like drop_all
in tearDown
is what hangs, so something about this test creates a lock on one of the tables without ever releasing it, which prevents it from being dropped, so the test just waits indefinitely for the table to be dropped in the tearDown
.
I'm guessing the server side transaction never gets aborted/committed somehow.
@icemac It looks like any test that uses more than one commit
will hang, I assume the second commit is bound to the wrong server side Postgres transaction, so it never actually gets commited, which will prevent the test tearDown from doing its things. I will dive into the code to try to see if I can make this more robust somehow.
@icemac Alright I think I figured out the problem, it wasn't the two commits, it's that in SQLAlchemy 2.0 bare engine/connects have transactions as well now without autocommit behavior, so when tests utilize the engine directly to get a new connection, that connection starts a transaction and never gets closed/commited, so we need to wrap it inside a with conn.begin()
.
The cleaner way to do this is probably to get a new connection in testSetUp and close it in tearDown, so the tests that need a connection outside the transaction/session machinery can just do a with self.conn.begin():
this should be backwards compatible with 1.1 and actually prevents the tests from hanging in 2.0.
Now there is one failure left in 2.0 and it is testNestedSessionCommitAllowed
. Looks like for some reason session.in_nested_transaction()
is False
in 2.0 by the time the before_commit
hook gets invoked, it does still resolve to True
before session.commit
is called though. So it looks like session.commit
is unwinding the nested transactions before the hook gets invoked.
@icemac Can you explain to me why session.commit
is fine for a nested transaction but not for a non-nested transaction? session.commit
will still commit the outermost transaction as well, so if that's what you wanted to avoid, that's not what you are doing. Generally nested transactions should make use of the context manager or call commit
/ rollback
etc on the savepoint itself that was returned by begin_nested
.
Edit: Looking at the git blame it looks like this exception was added as a result of #1 which incorrectly assumed how the subtransaction mechanism in SQLAlchemy works. session.begin_nested(); session.commit()
is not really a pattern, because it does not only commit the nested transaction, but the parent transaction as well... So I think it is fine to make that illegal again and force people to use savepoint = session.begin_nested(); savepoint.commit()
in third party code that is not aware of the savepoint mechanism in transaction
.
@icemac It's probably not possible to make the examples that are being run using doctest cross compatible with both <1.4 and 2.0, without making them unreadable, I think it's probably fair to skip doctest for older versions, since the examples in the docs should reflect the most recent SQLAlchemy version.
@icemac I took the liberty of also adding support for the psycopg3 backend while I was at it. This adds two additional test scripts, which will only be created and executed for SQLAlchemy 2.0 and test the two variations of Postgres with the new backend.
@icemac Until the situation mentioned in https://github.com/zopefoundation/zope.sqlalchemy/pull/79#issuecomment-1516069841 is clarified I've decided to skip the failing test in 2.0. I think it's fine to make session.commit
illegal in this case starting with SQLAlchemy 2.0. Personally I'd argue it should always have been illegal, but I don't know the reasoning behind this exception.
With this all tests are passing, although there are a couple of warnings. ~The LegacyAPIWarning should be fine to ignore for now. Although it would be easy to rewrite that query to not use the get
method.~ (fixed) The other warnings are psycopg3 specific and are probably related to the laissez-faire way the session connections are accessed in the threading tests and won't occur in a more sane and careful example. So we could probably ignore those too, although they are a bit more concerning than the other warnings. (I haven't been able to figure out which tests trigger this warning, it's possible it's because not all the sessions in the tests get properly closed, there's a recipe for a testRunner in the SQLAlchemy docs which ensures connections/sessions are always cleaned up properly in the tearDown, but I'm not sure it's backwards compatible. But it is certainly something we can try to switch to. Either way it is not a critical warning and it's certainly not caused by zope.transaction, but rather the tests themselves)
@Daverball wrote:
Can you explain to me why
session.commit
is fine for a nested transaction but not for a non-nested transaction?
Sorry I am not able to explain this. I'd like to go with your suggestion to prevent it and force to use savepoint.commit()
as long it is documented in the change log.
Additionally I am fine with running the doctests only against SQLAlchemy 2.0.
I've not yet been able to look into the code changes. Additionally I need a statement to resolve https://github.com/zopefoundation/zope.sqlalchemy/pull/79#issuecomment-1507958750
I've not yet been able to look into the code changes. Additionally I need a statement to resolve #79 (comment)
Yes, I have been holding off until all the changes have been reviewed and this is ready to be merged, so there's no ambiguity about which version of the code the declaration is valid against.
@icemac I managed to fix the ResourceWarning
in psycopg v3 and i updated the changelog. I am happy with the current state of the PR now, so feel free to start your review, as soon as you find yourself with enough time to do so. Once the changes have been approved I will comment with the declaration and a create a snapshot of the PR with the internet archive, so you can merge it.
@icemac Just pinging you again, in case this slipped through the cracks.
It seems we need packaging
as new dependency.
To get rid of the problems in the lint job, you can run tox -e isort-apply
It seems we need
packaging
as new dependency. To get rid of the problems in the lint job, you can runtox -e isort-apply
Both of these issues have been fixed.
@icemac I explicitly submit the contributions in this PR under the existing ZPL License.
And here is a snapshot with the declaration on the internet archive: http://web.archive.org/web/20230524070616/https://github.com/zopefoundation/zope.sqlalchemy/pull/79
I just released https://pypi.org/project/zope.sqlalchemy/3.0/ including these changes.
Functionally there doesn't appear to be anything that will make zope.sqlalchemy stop working on 2.0. In fact we've already had it running on 1.6 for a couple of months now with no problems.
I fixed the few tests that would've failed with SQLAlchemy 2.0 so that you can support 2.0 with confidence.