ome / omero-blitz

Gradle project containing Ice remoting code for OMERO
https://www.openmicroscopy.org/omero
GNU General Public License v2.0
0 stars 15 forks source link

cleanup and expose import keep alives (see #64) #65

Closed joshmoore closed 4 years ago

joshmoore commented 5 years ago

Long-running imports (or imports against servers with artificially short timeouts) fail and then hang. This makes the import keep alive times configurable, increases logging, removes synchronization, and enables keep alives in more situations.

see: https://forum.image.sc/t/keepalive-failed-while-importing-50gb-slide-book-data-file/28588/2

joshmoore commented 5 years ago

With:

this is now what I see:

2019-08-16 14:52:24,634 17560      [l.Client-1] INFO   ormats.importer.cli.LoggingImportMonitor - PIXELDATA_PROCESSED Step: 2 of 5  Logfile: 130
2019-08-16 14:53:11,753 64679      [1-thread-1] ERROR  me.formats.importer.util.ClientKeepAlive - Exception while executing ping(), logging Connector out:
java.lang.RuntimeException: Ice.ConnectionLostException
    error = 0
    at ome.formats.OMEROMetadataStoreClient.ping(OMEROMetadataStoreClient.java:841) ~[omero-blitz-5.5.4-SNAPSHOT.jar:5.5.4-SNAPSHOT]
    at ome.formats.importer.util.ClientKeepAlive.run(ClientKeepAlive.java:77) ~[omero-blitz-5.5.4-SNAPSHOT.jar:5.5.4-SNAPSHOT]
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[na:na]
    at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) ~[na:na]
    at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) ~[na:na]
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[na:na]
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[na:na]
    at java.base/java.lang.Thread.run(Thread.java:834) ~[na:na]
Caused by: Ice.ConnectionLostException: null
    at IceInternal.AsyncResultI.__wait(AsyncResultI.java:276) ~[ice.jar:na]
    at omero.api.ServiceFactoryPrxHelper.end_keepAllAlive(ServiceFactoryPrxHelper.java:5963) ~[omero-blitz-5.5.4-SNAPSHOT.jar:5.5.4-SNAPSHOT]
    at omero.api.ServiceFactoryPrxHelper.keepAllAlive(ServiceFactoryPrxHelper.java:5846) ~[omero-blitz-5.5.4-SNAPSHOT.jar:5.5.4-SNAPSHOT]
    at omero.api.ServiceFactoryPrxHelper.keepAllAlive(ServiceFactoryPrxHelper.java:5833) ~[omero-blitz-5.5.4-SNAPSHOT.jar:5.5.4-SNAPSHOT]
    at ome.formats.OMEROMetadataStoreClient.ping(OMEROMetadataStoreClient.java:834) ~[omero-blitz-5.5.4-SNAPSHOT.jar:5.5.4-SNAPSHOT]
    ... 7 common frames omitted
2019-08-16 14:53:11,755 64681      [1-thread-1] INFO         ome.formats.importer.ImportLibrary - Initializing log out
2019-08-16 14:53:14,903 67829      [2-thread-1] ERROR        ome.formats.importer.ImportLibrary - Error on import
java.lang.Exception: Import failure
    at ome.formats.importer.ImportLibrary.importImage(ImportLibrary.java:700) ~[omero-blitz-5.5.4-SNAPSHOT.jar:5.5.4-SNAPSHOT]
    at ome.formats.importer.ImportLibrary$1.call(ImportLibrary.java:353) ~[omero-blitz-5.5.4-SNAPSHOT.jar:5.5.4-SNAPSHOT]
    at ome.formats.importer.ImportLibrary$1.call(ImportLibrary.java:327) ~[omero-blitz-5.5.4-SNAPSHOT.jar:5.5.4-SNAPSHOT]
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[na:na]
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[na:na]
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[na:na]
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[na:na]
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[na:na]
    at java.base/java.lang.Thread.run(Thread.java:834) ~[na:na]
2019-08-16 14:53:14,903 67829      [2-thread-1] INFO         ome.formats.importer.ImportLibrary - Exiting on error

i.e. a much cleaner shutdown. Now time to test normal timeouts and a very long fake.

joshmoore commented 5 years ago

Looks like the more stringent closing is leading to 2 new log messages:

2019-08-19 05:55:24,255 4720       [l.Client-0] INFO   ormats.importer.cli.LoggingImportMonitor - IMPORT_DONE Imported file: /home/omero/workspace/OMERO-test-integration/src/dist/a.fake
Image:9343
Other imported objects:
Fileset:1064
2019-08-19 05:55:24,261 4726       [l.Client-0] INFO      ome.formats.importer.cli.ErrorHandler - Number of errors: 0

==> Summary
1 file uploaded, 1 fileset created, 1 image imported, 0 errors in 0:00:01.720
2019-08-19 05:55:24,273 4738       [      main] INFO         ome.formats.importer.ImportLibrary - Initializing log out

that's causing a few tests to fail. I'd propose to move these to DEBUG (and/or make them no-ops when there's no error condition).

joshmoore commented 5 years ago

Reverse diff of the changes I used to omero-server-docker's docker-compose.yml in order to test very, very short timeouts:

commit 8e1dbd7fc6df0041ef742d8f933d8ee30bc8df37
Author: jmoore <josh@glencoesoftware.com>
Date:   Fri Aug 16 14:01:48 2019 +0100

    Revert timeouts

diff --git a/Dockerfile b/Dockerfile
index a60028a..4c72c36 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -35,9 +35,9 @@ USER omero-server
 RUN sed -i.bak -re \
     's/(name="omero.fs.importArgs"\s+value=)""/\1"--no-upgrade-check"/' \
     /opt/omero/server/OMERO.server/etc/templates/grid/templates.xml
-RUN sed -i.bak -re \
-    's/(name="session-timeout"\s+default=)"600"/\1"10"/' \
-    /opt/omero/server/OMERO.server/etc/templates/grid/templates.xml
+#RUN sed -i.bak -re \
+#    's/(name="session-timeout"\s+default=)"600"/\1"10"/' \
+#    /opt/omero/server/OMERO.server/etc/templates/grid/templates.xml
 EXPOSE 4063 4064
 VOLUME ["/OMERO", "/opt/omero/server/OMERO.server/var"]

diff --git a/docker-compose.yml b/docker-compose.yml
index 36f95a4..85fc5d6 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -45,7 +45,7 @@ services:
       - CONFIG_omero_db_pass=omero
       - CONFIG_omero_db_name=omero
       - ROOTPASS=omero
-      - CONFIG_omero_sessions_timeout=10000
+      # CONFIG_omero_sessions_timeout=10000
     networks:
       - omero
     ports:

Note: this also needs testing with long timeouts which is what https://github.com/ome/bioformats/pull/3422 is for.

mtbc commented 5 years ago

Aha, bin/omero import --advanced-help now includes,

--keep-alive=SECS Frequency in seconds for pinging the server.

mtbc commented 5 years ago

Also, omero.sessions.timeout turns out to be a server-side property set in milliseconds, defaulting to 600,000ms.

mtbc commented 5 years ago

With omero.sessions.timeout=6000 I get Ice.UnknownException: null for verifyUpload when importing large SVS files but they import fine once I include -- --keep-alive=4 so that is looking as we might hope. :+1: