ome / omero-py

Python project containing Ice remoting code for OMERO
https://www.openmicroscopy.org/omero
GNU General Public License v2.0
20 stars 33 forks source link

Add logic to search OMERO.tables on non Pythonic named columns #287

Closed sbesson closed 2 years ago

sbesson commented 3 years ago

See https://github.com/ome/omero-metadata/issues/57 for the reference issue.

OMERO.tables HDF storage uses PyTables querying capabilities in particular the tables.get_where_list API. This API takes as a primary input a condition which syntax is defined as in https://www.pytables.org/usersguide/condition_syntax.html.

Columns named in a Python compliant way (i.e. matching ^[a-zA-Z_][a-zA-Z0-9_]*$), the column name can directly be used in the condition e.g. foo>1. Column names that do not match this patterns e.g. including spaces or brackets are supported but they need to be retrieved using getattr and they cannot directly used as an input condition. As it stands, this prevents the API from querying all the row based on a condition including such a column.

350dbc1 demonstrates how to use condvars and variable substitution to allow searching these non Pythonic named columns. ae5ca40 proposes to expand the semantics of the variables argument of tables.get_where_list to support these queries remotely. Any string/string key value pair is expanded as a string/column key/pair and passed to the underlying get_where_list API allowing variable substitution.

Using the OMERO API, this should allow the following types of queries

table.getWhereList("(count > 10) & (x =='pass')", variables={'x': rstring('Quality Control')}, start=0, stop=rowCount, step=0)
jburel commented 3 years ago
table.getWhereList("(count > 10) & (x =='pass')", variables={'x': 'Quality Control'}, start=0, stop=rowCount, step=0)

This is not what the test does rstring("Long column") cf. https://github.com/ome/openmicroscopy/pull/6283 I think it is preferable not to ask user of the API to have to use rstring

sbesson commented 3 years ago

PR description updated although the primary issue is rather at the level of the omero-blitz component as the Table.getWhereList API expects variables to be of type RTypeDict.

I would also be of the opinion of lowering the exposition of these types for consumers of the API if possible. What are the viable options?

joshmoore commented 3 years ago

What are the viable options?

A gateway method would be able to add a call to omero.rtypes.wrap, but we don't currently have a general interceptor for direct calls to Prx instances.

joshmoore commented 3 years ago

Thoughts on getting this out with #292?

sbesson commented 3 years ago

Answering https://github.com/ome/omero-py/pull/287#issuecomment-877390930 specifically, from a code perspective, I think the change here should be fairly harmless and releasable immediately. The PyTables API expectation is that the condvars variable will contain a dictionary pointing to Column instances. I have not been able to think of a scenario where the special handling of string values implemented here would break existing behaviors.

As discussed earlier this morning, I do not know of any immediate need for this fix so it probably makes sense to allow more time for the review of this PR alongside the examples and some technical documentation explaining its impact for API consumers.

imagesc-bot commented 2 years ago

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/omero-tables-questions/58161/4