ome / omero-blitz

Gradle project containing Ice remoting code for OMERO
https://www.openmicroscopy.org/omero
GNU General Public License v2.0
0 stars 15 forks source link

Lack of cleanup in ProcessContainer #141

Open chris-allan opened 1 year ago

chris-allan commented 1 year ago

While debugging a series of out of memory exceptions on one particular installation during a very large (over 100,000) number of imports Glencoe has noticed that the number of ome.services.blitz.repo.ManagedImportProcessI instances on the heap never decreases. The longer a server is running and the more complex the imports, the more of the available heap these objects consume. If left long enough, this will make a server unusable. As currently implemented this is due to ProcessContainer.removeProcess() not being called.

A lot of this code is quite old. The ProcessContainer itself was added as part of the initial work on OMERO 5.0.0 in ome/openmicroscopy#577 back in January, 2013. It's remained largely unchanged since. The intent for most of this work is outlined in the OMERO.fs documentation:

A few of these things never really got implemented as originally intended. In particular, the Process.ping(), Process.shutdown(), ProcessContainer.pingAll() and ProcessContainer.shutdownAll() methods are either not called or throw exceptions signifying they are not yet implemented. It's also not really clear if currently the ProcessContainer serves any other purpose beyond satisfying ManagedRepository.listImports(). While this method is client API accessible I'm unable to find any use of it by the OMERO CLI tooling, OMERO.web or OMERO.insight.

There is definitely more investigation and discussion to be had on whether it is valuable to implement the original intent of ProcessContainer either in full or in part. The community now has over 10 years of experience with OMERO.fs to draw on and lots of things have been added to the OMERO server import infrastructure during that time.

While this happens, Glencoe has started a repository to work on an OMERO server plugin which can act as a steward for ProcessContainer. Hopefully, this will help people out who currently have problems and support the discussion on where to go from here:

/cc @jburel, @joshmoore, @kkoz, @sbesson

chris-allan commented 1 year ago

You can check how many ManagedImportProcessI instances are in flight on your OMERO server system by executing:

jmap -histo:live <omero_blitz_pid> &> histo.out
grep 'ManagedImportProcess' histo.out

This will give output like this:

  85:          2362         188960  ome.services.blitz.repo.ManagedImportProcessI

Where the first column is the object ranking by bytes consumed, the second is the number of instances, and the third is the number of bytes consumed.

joshmoore commented 1 year ago

Wow. "bug" feels like an understatement. At a glance, I would assume the change most inline with the original intent would be to pass the processes field into the ManagedImportProcessI instance in order to remove the instance during cleaning, somewhere here:

    public void closeCalled(int idx) {
        final UploadState state = uploaders.getIfPresent(idx);
        if (state == null) {
            log.warn(String.format("closeCalled(%s) - no such object", idx));
        } else {
            uploaders.invalidate(idx);
            log.debug(String.format("closeCalled(%s) successfully", idx));
        }
    }

    //
    // CLOSE LOGIC
    //

    @Override
    protected void preClose(Current current) throws Throwable {
        // no-op
    }

    @Override
    protected void postClose(Current current) {
        // no-op
    }
chris-allan commented 1 year ago

The first pass of a cleanup plugin for OMERO servers has now been released (0.1.0):

imagesc-bot commented 1 year ago

This issue has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/first-release-of-omero-process-container-steward/85067/1