Currently, ossd__repositories and ossd__sbom both have a long queue to process in order to reach completeness. However, if an error occurs it starts again at the top of the queue as opposed to resuming in the correct place. We should partition the projects dataframe that is used as input to both of these assets so that we can ensure that there is a way to checkpoint the process.
Describe the solution you'd like
Split the projects dataframe into partitions.
Describe alternatives you've considered
If this doesn't work as intended we will need to have some kind of external state to control restarts.
Describe the feature you'd like to request
Currently,
ossd__repositories
andossd__sbom
both have a long queue to process in order to reach completeness. However, if an error occurs it starts again at the top of the queue as opposed to resuming in the correct place. We should partition theprojects
dataframe that is used as input to both of these assets so that we can ensure that there is a way to checkpoint the process.Describe the solution you'd like
Split the projects dataframe into partitions.
Describe alternatives you've considered
If this doesn't work as intended we will need to have some kind of external state to control restarts.