Open jburel opened 3 years ago
The issue with spaces in column names has been mentioned several times. As far as I understand, the investigation seemed to indication the limitation comes from PyTables i.e. the underlying storage mechanism for OMERO.tables.
Trying to find a few pointers, from the source code, do we know if the querying issues is related to the NaturalNameWarning
thrown in:
https://github.com/PyTables/PyTables/blob/0eed850b9031fb540edd2c1ff5c81b91efeba9d6/tables/path.py#L21 https://github.com/PyTables/PyTables/blob/0eed850b9031fb540edd2c1ff5c81b91efeba9d6/tables/path.py#L47-L49 https://github.com/PyTables/PyTables/blob/0eed850b9031fb540edd2c1ff5c81b91efeba9d6/tables/path.py#L87-L90
If this is the underlying problem, other characters commonly used in column headers like ()
or []
would also suffer from the same issue.
/cc @will-moore
An option could be to also add the CSV alongside the table. In some case it is good to have all the data in your hand.
I can definitely see having the CSV attached as a workaround, but to some extent, it's saying that the tables services does not suffice.
The CSV is a workaround but can be a valid option depending on the language used to access the data e.g. R due to the data manipulation java <-> R. As it stands the service is not enough. So we need to revisit it.
https://github.com/ome/omero-py/pull/287 starts exploring solutions for searching tables using columns with space in names.
The underlying problem is that you cannot write a valid PyTables condition e.g. table.where("my column"=="foo")
is not valid.
https://github.com/ome/omero-py/pull/287 contains a proof of concept that these queries are possible using a substitution variable and condvars to map the variable to the appropriate column in the table using getattr
.
Currently blocked on passing this condvars
mapping using the remote API. Up for discussion, but I suspect one way forward would to define an API passing the mapping as a simple <variable name>: <column name>
dictionary and internalize the logic allowing to retrieve the column using getattr
.
The CSV
workaround is not really needed, I have opted to use the Web API to load the table data and it works nicely. it has been used in https://github.com/IDR/idr0094-ellinger-sarscov2/blob/master/notebooks/idr0094-ic50.ipynb and https://github.com/IDR/idr0094-ellinger-sarscov2/blob/master/apps/app.R
The corresponding change has been merged upstream in OMERO.py - https://github.com/ome/openmicroscopy/pull/6283/files brings a proof of concept of how to write a query against a column with space in its name. I have not retested in the IDR context but I assume this issue can either can be closed (as we decided it was not an issue specific to the metadata plugin) and/or moved as a documentation issue?
Tables in IDR have spaces in most of the columns' name. This implies that it is not possible to retrieve specifying the value in a given column e.g. give me the row with Remdesivir in the
Compound Name
column. To filter one needs to load the full table (~15mins loading time) to retrieve few relevant rows, in the remdesivir example, 24/9792 rows are relevant.