leap-stc / cmip6-leap-feedstock

Apache License 2.0
12 stars 5 forks source link

Large number of iids presenting new challenges #148

Open jbusecke opened 4 months ago

jbusecke commented 4 months ago

145 seems to have unblocked a lot of the iids that previously were not available! Big Win in general, but we need to work on some of the parts of our infrastructure.

This has led to two issues:

The speed considerations will become more pertinent as we add more iids with time. In particular the 'parsing' step where we go from the input list (with wildcards, brackets) to a list of single iids will produce more and more requests on each run of the deployment action. The following steps will presumably get more manageable over time since we are pruning off the iids that are already ingested.

We are currently also handling this fairly inefficiently and are basically querying for the dataset info twice (once in expand_instance_id_list and then in get_recipe_inputs_from_iid_list(which currently takes a list of instance ids).

Going forward we should probably extract something like { 'instance_id': {'id':..., 'field_a':..., }, 'other_instance_id':{'id':..., 'field_a':..., }, ... }

This would make it trivial to prune off existing iids and then passing only the 'id' fields to get_recipe_inputs_from_iid_list

jbusecke commented 4 months ago

149 does implement the bq batching in the recipe, but still waiting for a proper fix in https://github.com/leap-stc/leap-data-management-utils/issues/33

jbusecke commented 4 months ago

Waiting on #149 merged results to see if we can handle a bunch more iids (even though it might be slow for now).

jbusecke commented 4 months ago

https://github.com/leap-stc/cmip6-leap-feedstock/actions/runs/8990218806 is actually highly successful! 200+ datasets ingested and still going strong!