opensource-observer / oso

Measuring the impact of open source software
https://opensource.observer
Apache License 2.0
73 stars 16 forks source link

Partition github api related dagster assets #2415

Open ravenac95 opened 3 weeks ago

ravenac95 commented 3 weeks ago

Describe the feature you'd like to request

Currently, ossd__repositories and ossd__sbom both have a long queue to process in order to reach completeness. However, if an error occurs it starts again at the top of the queue as opposed to resuming in the correct place. We should partition the projects dataframe that is used as input to both of these assets so that we can ensure that there is a way to checkpoint the process.

Describe the solution you'd like

Split the projects dataframe into partitions.

Describe alternatives you've considered

If this doesn't work as intended we will need to have some kind of external state to control restarts.