swiss-art-research-net / skkg-pipeline

ETL pipeline for the Sammlung Digital project
2 stars 0 forks source link

Define pipeline cycles #204

Open fkraeutli opened 1 month ago

fkraeutli commented 1 month ago

There are several tasks that need to be executed regularly once the pipeline is in production. Define appropriate update cycles.

fkraeutli commented 1 month ago
Task Description Update Cycle
default Retrieves new data from the MuseumPlus instance and maps it to RDF daily?
create-data-dump, push-latest-data-dump Combines all generated RDF data in a single TTL file and pushes it to s3 weekly?
update-iiif Redownloads the IIIF CSV and maps it to RDF
update-vocabularies Downloads the vocabularies from MuseumPlus, proposes alignments and ingests it into the triple store
create-blazegraph-backup Creates a local backup of the Blazegraph journal
remove-deleted-items Query MuseumPlus for items that have been deleted and remove them