ome / omero-py

Python project containing Ice remoting code for OMERO
https://www.openmicroscopy.org/omero
GNU General Public License v2.0
20 stars 33 forks source link

Tables Out of Memory Issues #314

Closed kkoz closed 2 years ago

kkoz commented 2 years ago

We found that when retrieving a large number of rows from certain Omero tables, we were having memory issues even when only one small column was selected. Upon investigation, we found that whenever column::read is called, the entire table is retrieved for the requested number of rows (see https://github.com/ome/omero-py/blob/2b9d1b43f69e02d8784bfef50dd07415eb193f34/src/omero/columns.py#L108). Then the requested columns are subsequently filtered for (see https://github.com/ome/omero-py/blob/2b9d1b43f69e02d8784bfef50dd07415eb193f34/src/omero/columns.py#L151). The PyTables API allows for the passing of a "field" argument into it's read and readCoordinates functions which will filter for the specified column by name (see https://github.com/PyTables/PyTables/blob/a51c549c1555050d72e0aeb2ff5e04d48a71b1b4/tables/table.py#L1747). In this PR, we use this argument to prevent retrieval of the entire table for each column.

joshmoore commented 2 years ago

Not that I assume anyone is using it, but since fromrows is a public method, I've pushed a commit that keeps backwards compatibility. Let me know what you think.

joshmoore commented 2 years ago

Unrelated build failure:

E   FileExistsError: [Errno 17] File exists: path('/home/runner/omero/tmp/omero_runner')
1901

Re-launched.

kkoz commented 2 years ago

@joshmoore I think all_rows doesn't exist anymore so I changed to to rows, but other than that it looks good to me.

joshmoore commented 2 years ago

(Sad that my mistake didn't fail a test somewhere ;) )