Closed graft closed 8 months ago
is the motivation behind this PR to ease load of magma by chunking/paging updates together?
Yes; to clarify, when loading data for a particular project I found some file collections numbering in the thousands would result in the operation timing out. I was able to load the data by splitting up the upload using match/touch, but the process is difficult. This PR attempts to smooth that over by chunking updates to magma so updates do not time out.
Manual testing also passed!
This adds pagination to the
update
method in the Python magma client, and makes use of this in the Metis linker in airflow.A few issues: One problem is, I'm not sure what a decent page_size is, or whether this should be configurable by users. I guessed at 1000, but I'm not sure what this number should be. Different kinds of updates probably also have different update times; files, which require Magma phoning Metis, generally take a lot longer to handle than data attributes, so different scripts might have different page limits.
The update pagination also doesn't have a very nice sad path, i.e., if there are errors in the update, only some of them may be reported, and an incomplete/partial update may occur (that is, the pages, as far as Magma is concerned, are independent, and the notion of "pages" is only on the client side).
To test: