ome / omero-web

Django-based OMERO.web client
https://www.openmicroscopy.org/omero
16 stars 29 forks source link

Batch by MAX_TABLE_DOWNLOAD_ROWS #563

Open chris-allan opened 2 weeks ago

chris-allan commented 2 weeks ago

We will be loading MAX_TABLE_DOWNLOAD_ROWS into memory and presenting it for download, it makes sense that batching happens at the same scale. With ome/omero-web#554 this is less of a concern but it makes sense to unify. In the near future it may be pertinent to remove the client side batching and "lazy" semantics entirely as they no longer really make sense or are completely unused.

/cc @erindiel, @sbesson

chris-allan commented 2 weeks ago

Background for this change includes high performance scenarios where MAX_TABLE_DOWNLOAD_ROWS is one or more orders of magnitude larger than the default (10000). Forcing the batch size to 1000 is incredibly high overhead in such a scenario.

will-moore commented 2 weeks ago

I don't think the "lazy" loading is unused. The idea with the lazy loading is to remove (or allow much larger values for) the upper limit on number of rows when you're downloading a table as csv, since you're "streaming" the download. Looking into this now, it looks like you can't actually download more than MAX_TABLE_DOWNLOAD_ROWS at a time, since this is blocked by a the check_max_rows flag. But if we were to remove the check on that, it should allow larger amounts to be downloaded.

I guess the optimal batch size when downloading would depend on how much data you can load at a time (not sure what the limiting factor is here - Ice Memory Limit?) and how many columns you have in your table.

chris-allan commented 2 weeks ago

It would allow more rows to be downloaded but would still lock the table, keep the service and session alive, block the OMERO.web worker, etc. It's not good enough "streaming" to have it be enabled and that's why we doesn't use it and you implemented #300 like you did.