Open cisaacstern opened 2 years ago
I believe there are only two remaining prerequisites before the actual work of this feature can begin in pangeo-forge-recipes:
Once these two items are addressed, work can begin on an appending feature in pangeo-forge-recipes.
@rabernat Per our discussion in the call yesterday I'm including some more details on our AWS-ASDI specific use cases here rather than in a new ticket on pangeo-forge-recipes
. For many of the reference indexes we are generating for data in the AWS PDS buckets https://github.com/pangeo-forge/staged-recipes/issues/208, we'll need to periodically update the index as new data becomes available. In almost all of our cases, this will involve expanding the index's time dimension. I think our use case is a bit atypical in that most of the buckets where these datasets live have event notifications configured for new keys which allow us to monitor data being added. Originally I had envisioned us queuing these event notifications and periodically sending a block of new files to pangeo-forge
for appending to the target archive. This will be great for our use case, but I don't think it generalizes as well for most users. Instead I think we'll likely need a process that
FilePattern
for the data to append.This assumes that the recipe's concat dim is temporal and we'd likely need to restrict the append only cron configuration to work for recipes where this is true.
@cisaacstern has linked most of the related issues above but I'll include the more recent Beam specific issue here for tracking as well https://github.com/pangeo-forge/pangeo-forge-recipes/issues/447
User Profile
As a recipe maintainer
User Action
I want to re-run recipes in my feedstock (either manually or on a schedule) to append newly released data to my dataset
User Goal
So that I can keep the dataset built by my feedstock up-to-date with the latest releases from the data provider without needing to re-run the entire recipe
Acceptance Criteria
The ability to trigger append-only production runs (manually or on a schedule) from a feedstock. This might be inferred from the recipe itself, or perhaps specified by a new property in the
meta.yaml
Linked Issues