ome / omero-py

Python project containing Ice remoting code for OMERO
https://www.openmicroscopy.org/omero
GNU General Public License v2.0
21 stars 32 forks source link

Replace Portalocker #370

Closed jburel closed 1 year ago

jburel commented 1 year ago

Replace internal portalocker by the upstream version. This is the first step to be able to upgrade to Python 3.8+ (windows)

sbesson commented 1 year ago

0735974 is fully disabling Windows testing. Is it on purpose or is additional work planned to selectively exclude the backend OMERO.tables tests on Windows?

sbesson commented 1 year ago

All CI tests are now successful and the OMERO.tables are skipped on Windows. I assume this will be assessed in a round to functional testing. Trying to identify a few scenarios which be reviewed in a deployment with this PR included:

Coming back briefly to the code changes and specifically https://github.com/ome/omero-py/pull/370/commits/a77749df58d1797f5ecc2af80694ea1078b6dc47, https://github.com/ome/openmicroscopy/pull/3431/ introduced a lot of logic specifically for handling OMERO.tables on Windows. Is it worth looking into reverting these changes as part of this clean-up to simplify our future maintenance? Or limit this to the minimal amount of changes and capture as an issue?

jburel commented 1 year ago

I would rather keep this PR as simple as possible i.e. remove dependency so we can use Python 3.8+ We can look at refactoring in a follow-up PR.

joshmoore commented 1 year ago

:+1: for the change in a post-Windows world but also for cleaning up the changes (in the future).

jburel commented 1 year ago

issue created https://github.com/ome/omero-py/issues/373

Now that CI is back @pwalczysko @will-moore could you go over the scenarios highlighted above i.e.

will-moore commented 1 year ago

OK - I remember we had issues with async parallel reading of tables before - see https://github.com/ome/omero-web/pull/25 and that was "fixed" client-side by avoiding async requests. Was portalocker supposed to prevent that scenario? Is the replacement expected to do better at that than portalocker? Is that a good testing scenario for this PR?

jburel commented 1 year ago

We still use portalocker but this time it is the upstream version.

will-moore commented 1 year ago

I'm going to create a Table on merge-ci to test, but I'm not really sure what pass/fail looks like? If portalocker doesn't work, do we expect to see corrupted files or some other clue? If the existing behaviour - e.g. omero::InternalException at https://forum.image.sc/t/server-error-after-populating-metadata-for-images-using-web-script/27844/14 is a "pass", what's a fail?

will-moore commented 1 year ago

Testing with Josh's script at https://forum.image.sc/t/server-error-after-populating-metadata-for-images-using-web-script/27844/14 gives the same results on merge-ci and on latest-ci (but not sure that even tests this PR)?

jburel commented 1 year ago

As part of a general clean-up, it will be good to integrate this PR in the next release

will-moore commented 1 year ago

@jburel - thinking about a release now, should we get this merged?

jburel commented 1 year ago

I think we can. any objections @joshmoore @sbesson?

sbesson commented 1 year ago

Have all the scenarios mentioned in https://github.com/ome/omero-py/pull/370#issuecomment-1588774666 been validated? It's not obvious from the following reviews in this PR

will-moore commented 1 year ago

It's not clear to me how to test concurrent access to OMERO config or OMERO.tables and what does a pass or fail look like?

We probably need a server without portalocker enabled and a clear idea of how to demonstrate something bad happening (file corruption?) if we want to be sure that this is prevented by portalocker.

sbesson commented 1 year ago

It's not clear to me how to test concurrent access to OMERO config or OMERO.tables and what does a pass or fail look like?

For OMERO config, a minimal test is to call omero config edit in a terminal. This should lock the XML file. In another terminal to make a call to read/write operation on the configuration file.

With the current OMERO.py release, I get

$ omero config edit
Could not acquire lock on /opt/omero/OMERO.current/etc/grid/config.xml

OMERO.tables will likely need a custom script testing something a similar scenario where a table is being locked and accessed at the same time by another process.

will-moore commented 1 year ago

With this branch installed locally, this works - (running this in 2 terminals at the same time):

$ omero config edit
Could not acquire lock on /Users/wmoore/Desktop/OMERO.server-5.6.5-ice36-b233/etc/grid/config.xml

Going on the tests as a guide, tried this...

$ touch test_lock
$ python

>>> import omero.hdfstorageV2 as storage_module
>>> import threading
>>> lock = threading.RLock()
>>> HdfStorage = storage_module.HdfStorage
>>> hdf1 = HdfStorage("/Users/wmoore/Desktop/PY/omero-py/test_lock", lock)
>>> hdf2 = HdfStorage("/Users/wmoore/Desktop/PY/omero-py/test_lock", lock)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/wmoore/Desktop/PY/omero-py/target/omero/hdfstorageV2.py", line 203, in __init__
    self.__hdf_file = HDFLIST.addOrThrow(file_path, self, read_only)
  File "/Users/wmoore/Desktop/PY/omero-py/target/omero/util/decorators.py", line 94, in with_lock
    return func(*args, **kwargs)
  File "/Users/wmoore/Desktop/PY/omero-py/target/omero/hdfstorageV2.py", line 126, in addOrThrow
    raise omero.LockTimeout(
omero.LockTimeout: exception ::omero::LockTimeout
{
    serverStackTrace = None
    serverExceptionClass = None
    message = Path already in HdfList: /Users/wmoore/Desktop/PY/omero-py/test_lock
    backOff = 0
    seconds = 0
}

Also tried this, which didn't throw an Exception, but not sure if this should?

# conn.connect() etc...
r = conn.getSharedResources()
t = r.openTable(omero.model.OriginalFileI(2154), conn.SERVICE_OPTS)
t2 = r.openTable(omero.model.OriginalFileI(2154), conn.SERVICE_OPTS)

Also just realised that this is testing server-side code, whereas I only have this PR installed client-side.

will-moore commented 1 year ago

Any more testing needed? How do I connect to merge-ci server? merge-ci-devspace.openmicroscopy.org?

$ omero login
Previously logged in to localhost:4064 as will
Server: [localhost:4064]merge-ci-devspace.openmicroscopy.org
Username: [will]user-3
Password:
InternalException: Failed to connect: Ice.ConnectionLostException:
recv() returned zero
sbesson commented 1 year ago

Also just realised that this is testing server-side code, whereas I only have this PR installed client-side.

You're correct that the portalocker usage in OMERO.tables is fully server-side and I have not found a reliable way to test it client-side.

After a quick investigation, the most relevant place I found to test this logic are the OMERO.tables unit tests which have been de-activated since the Python 3 migration in https://github.com/ome/omero-py/commit/b185956d45670eab3af80e6cd2e334dcfe8b3b65.

https://github.com/jburel/omero-py/compare/portalocker...sbesson:omero-py:table_locking_tests proposes 2 commits to re-activate these tests on top of this branch. d677199c0d0996211c568c69533de648f4c60723 in particular tests the omero.LockTimeout exception raised by the portalocker logic. Note that a number of adjustments had to be made to the TestHDFList.testLocking which are quite possibly related to the upstream backwards incompatible changes in the Pytables 3.x line including https://www.pytables.org/MIGRATING_TO_3.x.html#api-name-changes and https://www.pytables.org/release-notes/RELEASE_NOTES_v3.1.x.html#backward-incompatible-changes

jburel commented 1 year ago

@sbesson do you want to push the commits to this branch?

sbesson commented 1 year ago

Done. I believe the readthedocs CI build failure is unrelated

jburel commented 1 year ago

Done. I believe the readthedocs CI build failure is unrelated

It is I have opened a PR. readthedocs is now requiring a flag. see https://github.com/ome/omero-py/pull/383

will-moore commented 1 year ago

I was trying to see where those unit tests are actually run, but I don't see anything explicit in the jobs above? Do we just assume they're running and passing silently?

sbesson commented 1 year ago

https://github.com/sbesson/omero-py/commit/1ab20f75e7e546cc54cefc91aa64a1d883c6549e is one way to address this issue and use the verbose flag to list skipped/passing/failing tests individually. Should I push it to this branch?

jburel commented 1 year ago

@sbesson please push