nextflow-io / nf-prov

Apache License 2.0
23 stars 11 forks source link

Use canonical etag for BCO #8

Open bentsherman opened 11 months ago

bentsherman commented 11 months ago

The BCO standard includes an "etag" field which should be a SHA256 of the BCO manifest. The hash should ideally be "canonical", which in this context means that it shouldn't be affected by things like the order in which tasks were executed. With the current code, you could perform the exact same run twice and get a different etag because of the task ordering.

The etag should be improved by making sure that collections in the BCO manifest are always ordered in a consistent way.

Tasks are currently sorted by task id (which is used for the step number), which is not compatible with the goals of this issue. If we want to have it both ways, we could compute the etag separately from the actual BCO manifest. The downside there is that we have to keep the rendering code in sync with the hashing code. Might need to think more on this.