oslokommune / devportal-harvest-poc

0 stars 0 forks source link
utviklerportalen

Proof of concept for harvesting and distributing API metadata

Data flow diagram Pipeline flow diagram

Harvest

More information regarding harvesting can be found here

Distribute

More information regarding distribution can be found here

Preparation

kubectl apply -f charts/harvest-output-pvc. This is the PVC where all the harvesters will pipe their output to and where the latest_provider and distributors will read from.

Manually create the following folders on the PVC: mkdir -p /dataservice/{10_raw,20_aggregations,30_result}

Stack

Based on a sources.yaml file, the harvest.py script will generate cronjobs that will pipe their output to a persistent volume.

The latest_provider service will upon a GET request to /apis expose the sum of all the .json files in the mentioned persistent volume claim.

The harvester frontend does a GET /apis to the latest_provider service and presents the result.

Roadmap