ome / omero-metadata

OMERO plugin for metadata manipulation https://www.openmicroscopy.org/omero/
GNU General Public License v2.0
7 stars 13 forks source link

OMERO table space in column name #57

Open jburel opened 3 years ago

jburel commented 3 years ago

Tables in IDR have spaces in most of the columns' name. This implies that it is not possible to retrieve specifying the value in a given column e.g. give me the row with Remdesivir in the Compound Name column. To filter one needs to load the full table (~15mins loading time) to retrieve few relevant rows, in the remdesivir example, 24/9792 rows are relevant.

sbesson commented 3 years ago

The issue with spaces in column names has been mentioned several times. As far as I understand, the investigation seemed to indication the limitation comes from PyTables i.e. the underlying storage mechanism for OMERO.tables.

Trying to find a few pointers, from the source code, do we know if the querying issues is related to the NaturalNameWarning thrown in:

https://github.com/PyTables/PyTables/blob/0eed850b9031fb540edd2c1ff5c81b91efeba9d6/tables/path.py#L21 https://github.com/PyTables/PyTables/blob/0eed850b9031fb540edd2c1ff5c81b91efeba9d6/tables/path.py#L47-L49 https://github.com/PyTables/PyTables/blob/0eed850b9031fb540edd2c1ff5c81b91efeba9d6/tables/path.py#L87-L90

If this is the underlying problem, other characters commonly used in column headers like () or [] would also suffer from the same issue.

/cc @will-moore

jburel commented 3 years ago

An option could be to also add the CSV alongside the table. In some case it is good to have all the data in your hand.

jburel commented 3 years ago

see https://trello.com/c/9rIyIDoi/126-cant-query-omerotable-column-names-with-spaces

joshmoore commented 3 years ago

I can definitely see having the CSV attached as a workaround, but to some extent, it's saying that the tables services does not suffice.

jburel commented 3 years ago

The CSV is a workaround but can be a valid option depending on the language used to access the data e.g. R due to the data manipulation java <-> R. As it stands the service is not enough. So we need to revisit it.

sbesson commented 3 years ago

https://github.com/ome/omero-py/pull/287 starts exploring solutions for searching tables using columns with space in names.

The underlying problem is that you cannot write a valid PyTables condition e.g. table.where("my column"=="foo") is not valid. https://github.com/ome/omero-py/pull/287 contains a proof of concept that these queries are possible using a substitution variable and condvars to map the variable to the appropriate column in the table using getattr.

Currently blocked on passing this condvars mapping using the remote API. Up for discussion, but I suspect one way forward would to define an API passing the mapping as a simple <variable name>: <column name> dictionary and internalize the logic allowing to retrieve the column using getattr.

jburel commented 3 years ago

The CSV workaround is not really needed, I have opted to use the Web API to load the table data and it works nicely. it has been used in https://github.com/IDR/idr0094-ellinger-sarscov2/blob/master/notebooks/idr0094-ic50.ipynb and https://github.com/IDR/idr0094-ellinger-sarscov2/blob/master/apps/app.R

sbesson commented 2 years ago

The corresponding change has been merged upstream in OMERO.py - https://github.com/ome/openmicroscopy/pull/6283/files brings a proof of concept of how to write a query against a column with space in its name. I have not retested in the IDR context but I assume this issue can either can be closed (as we decided it was not an issue specific to the metadata plugin) and/or moved as a documentation issue?