ome / omero-py

Python project containing Ice remoting code for OMERO
https://www.openmicroscopy.org/omero
GNU General Public License v2.0
21 stars 32 forks source link

Switch result logging to DEBUG level #417

Closed chris-allan closed 2 months ago

chris-allan commented 3 months ago

Converting every remoted result value to a string can be an incredibly expensive operation. This is especially true if hundreds of thousands of table cells, along with their row indices are passing through the OMERO.tables API. In such a scenario the omero.grid.Data object will be fully converted to a string and then the first 120 characters logged which can take longer than the query, slice or read operation itself.

Rather than remove the functionality entirely, which might be justified (the first 120 characters is rarely that useful), I've opted to raise the level to debug. This impacts OMERO.tables and the script Processor logging.

The change improves performance and memory usage when retrieving even moderate amounts of data quite substantially. We could look to adjusting OMERO.web's MAX_TABLE_DOWNLOAD_ROWS up accordingly.

/cc @erindiel, @kkoz, @DavidStirling, @emilroz

joshmoore commented 3 months ago

Keeping a parallelism with the Java processes was also going to be my primary comment. Otherwise, :+1:

chris-allan commented 3 months ago

Keeping a parallelism with the Java processes was also going to be my primary comment. Otherwise, 👍

Maybe as a compromise we keep f"Rslt: {type(rv)}" or similar at INFO?

The Java logging stack is pretty good at abridging/truncating large objects and object hierarchies and avoids calling toString() on Ice objects so we don't have the same concerns we have here.

joshmoore commented 3 months ago

I'm not overly worried, but I have definitely written parsers in the past which look for Rslt: so including the line, even if it just says "omitted" would be lower impact. In the case of tables.py specifically, another possible middle ground would be including the size of the returned object, but again, not a strong suggestion.

chris-allan commented 3 months ago

:+1:

At INFO, the default level, you now get something like:

2024-06-21 08:42:26,634 INFO  [                            omero.remote] (Dummy-4   )  Meth: TableI.getNumberOfRows
2024-06-21 08:42:26,634 INFO  [                     omero.tables.TableI] (Dummy-4   ) Table-B22F57F4-C4DF-409A-9211-449420B3C553.getNumberOfRows() => 4
2024-06-21 08:42:26,634 INFO  [                            omero.remote] (Dummy-4   )  Rslt: <class 'int'>