ome / omero-web

Django-based OMERO.web client
https://www.openmicroscopy.org/omero
16 stars 29 forks source link

New table API endpoints for lower level table access #564

Open knabar opened 2 weeks ago

knabar commented 2 weeks ago

This PR adds two new endpoints for lower level table access:

Examples:

/webgateway/table/123/rows/?query=object%3E100000&start=8470
{"rows":[8470,8471,8472,8473,8474,8475,8476],"meta":{"rowCount":8477,"start":8470,"end":8477}}
/webgateway/table/123/slice/?rows=8005-8010&columns=0-1
{"columns":[[114391,114392,114393,114394,114395,114396],[NaN,NaN,NaN,NaN,NaN,NaN]],"meta":{"columns":["object","name"],"rowCount":8477}}

Notes:

will-moore commented 1 week ago

It looks like rowCount and end are based on the whole table rather than the results of the query: e.g. on merge-ci with webgateway/table/22909/rows/?query=(Count>12)%26(Count<23) I'm seeing:

{
"rows": [
  1,
  3
],
"meta": {
  "rowCount": 13,
  "start": 0,
  "end": 13
}
}

Is that expected?

knabar commented 1 week ago

@will-moore Yes, that is expected.

The total number of results is not available, since the query is not run on the whole table, but only from start to start+MAX_TABLE_SLICE_SIZE.

Figuring out the total is left to the client, by adding up the number of returned rows for each call. For tables with less than MAX_TABLE_SLICE_SIZE rows, it'll be just one call anyway.

end is not actually the number of rows of the table, but the row where querying ended, so the client can easily continue querying by passing in the end of the first query as the start of the second. end==rowCount indicates that there is nothing else to query.

start is there to indicate at which row the querying started, which is redundant, since the client would know anyway, but perhaps there are situations where it is useful.

will-moore commented 1 week ago

Everything is working just fine for /slice when I use the query correctly, so I tried playing about a bit (Not all these need to be fixed/changed)... If I make a mistake, the error handling is varied:

E.g. If I make an invalid query like this: /slice/?rows=3-1&columns=oops I get a nice "error": "Need to specify comma-separated list of rows and columns".

If I forget what needs to be in the query e.g. /slice/?rows=3-1 I get a less friendly:

"message": "'NoneType' object has no attribute 'split'",
"stacktrace": "Traceback (most recent call last):\n  File \"/home/omero/workspace/OMERO-web/.venv3/lib64/python3.9/site-packages/omeroweb/webgateway/views.py\", line 1447, in wrap\n    rv = f(request, *args, **kwargs)\n  File \"/home/omero/workspace/OMERO-web/.venv3/lib64/python3.9/site-packages/omeroweb/webgateway/views.py\", line 3586, in perform_slice\n    for item in source.get(\"columns\").split(\",\")\nAttributeError: 'NoneType' object has no attribute 'split'\n"

if I use the range the wrong way around I get the whole table (regardless of the number of rows and columns):

?rows=3-1&columns=4-3

If I go out of range,

/slice/?rows=3-1&columns=1-100

I get "error": "Error slicing table". Don't know how hard/expensive it is to check this before trying the slicing to give a more helpful message?

knabar commented 1 week ago

@will-moore added better error handling and checks

knabar commented 1 week ago

Made some more convenience changes:

chris-allan commented 1 week ago

I don't think having collapse is a good precedent to set. Looping over the results in pure Python is going to perform wildly differently depending on the result set so knowing whether to use collapse is not something that is easy to do.

Edit: Furthermore, I think it's a usability downgrade. The client then would need to decompress the result set of getWhereList() in order to know which rows actually match or use it with slice(). If you want contiguous slices then exposing read() is better anyway as it just takes the column numbers and a start stop.

knabar commented 5 days ago

Removed the collapse option after some performance testing that did not show a worthwhile improvement