mountetna / monoetna

mono-repository version of etna projects
GNU General Public License v2.0
1 stars 0 forks source link

allow update pagination in etna-py magma client + paginate metis linker #1270

Closed graft closed 8 months ago

graft commented 9 months ago

This adds pagination to the update method in the Python magma client, and makes use of this in the Metis linker in airflow.

A few issues: One problem is, I'm not sure what a decent page_size is, or whether this should be configurable by users. I guessed at 1000, but I'm not sure what this number should be. Different kinds of updates probably also have different update times; files, which require Magma phoning Metis, generally take a lot longer to handle than data attributes, so different scripts might have different page limits.

The update pagination also doesn't have a very nice sad path, i.e., if there are errors in the update, only some of them may be reported, and an incomplete/partial update may occur (that is, the pages, as far as Magma is concerned, are independent, and the notion of "pages" is only on the client side).

To test:

  1. Run the Metis Linker on a set of files and validation completion of linking. See here for how to setup Metis Linker in dev.
  2. Change page_size at airflow/opt/providers/etna/etna/etls/metis_linker.py:268 to be less than the number of files and validate completion of linking.
amadeovezz commented 9 months ago

is the motivation behind this PR to ease load of magma by chunking/paging updates together?

graft commented 9 months ago

Yes; to clarify, when loading data for a particular project I found some file collections numbering in the thousands would result in the operation timing out. I was able to load the data by splitting up the upload using match/touch, but the process is difficult. This PR attempts to smooth that over by chunking updates to magma so updates do not time out.

amadeovezz commented 9 months ago

Manual testing also passed!

Screenshot 2023-10-03 at 4 53 10 PM Screenshot 2023-10-03 at 4 43 47 PM