ome / omero-py

Python project containing Ice remoting code for OMERO
https://www.openmicroscopy.org/omero
GNU General Public License v2.0
20 stars 31 forks source link

Allow a caller to ignore row numbers #418

Open chris-allan opened 2 weeks ago

chris-allan commented 2 weeks ago

For calls to readCoordinates, read, and slice the returned value order in the Data response is the same as requested. While having the row numbers included in the response is convenient, when the number of cells being returned is high this incurs memory and serialization overhead. This is especially true when retrieving a small number of columns for a large number of rows; in this case rowNumbers can actually be more expensive to include than the data itself.

Here we use the Ice context and "omero.tables.include_row_numbers" to additively affect the client API without changing any of the Ice method prototypes.

/cc @erindiel, @kkoz, @DavidStirling, @emilroz

chris-allan commented 2 weeks ago

Example when used on a small table:

In [1]: sr = client.getSession().sharedResources()
   ...: a = sr.openTable(omero.model.OriginalFileI(24965, False))

In [2]: a.slice([0], [3, 0], {"omero.tables.include_row_numbers": "true"})
Out[2]:
object #0 (::omero::grid::Data)
{
    lastModification = 1718797774346
    rowNumbers =
    {
        [0] = 3
        [1] = 0
    }
    columns =
    {
        [0] = object #1 (::omero::grid::StringColumn)
        {
            name = ImageName
            description =
            size = 53
            values =
            {
                [0] = siControl_N20_Cep215_I_20110411_Mon-1509_0_SIR_PRJ.dv
                [1] = Centrin_PCNT_Cep215_20110506_Fri-1608_0_SIR_PRJ.dv
            }
        }
    }
}

In [3]: a.slice([0], [3, 0], {"omero.tables.include_row_numbers": "false"})
Out[3]:
object #0 (::omero::grid::Data)
{
    lastModification = 1718797774346
    rowNumbers =
    {
    }
    columns =
    {
        [0] = object #1 (::omero::grid::StringColumn)
        {
            name = ImageName
            description =
            size = 53
            values =
            {
                [0] = siControl_N20_Cep215_I_20110411_Mon-1509_0_SIR_PRJ.dv
                [1] = Centrin_PCNT_Cep215_20110506_Fri-1608_0_SIR_PRJ.dv
            }
        }
    }
}

If we're happy with the implementation I'll make separate PRs to add integration tests like we have for the bitmask query and update the main OMERO.tables documentation detailing the feature.

chris-allan commented 2 weeks ago

👍

I'm surprised that the row numbers are longer than other columns except perhaps bools 😏 but I can definitely see how they would effectively double the overhead.

Also true for short string columns. When it comes to memory usage in Python in particular, also true where the same numbers or strings repeat. These are both common in a lot of the data analysis outputs we're exposed to.

chris-allan commented 6 hours ago

Integration test added in ome/openmicroscopy#6396.

chris-allan commented 6 hours ago

Documentation added in ome/omero-documentation#2441.