google / oss-rebuild

Securing open-source package ecosystems by originating, validating, and augmenting build attestations.
Apache License 2.0
32 stars 4 forks source link

Expose the exact normalizations required to achieve reproduction #83

Open msuozzo opened 2 months ago

msuozzo commented 2 months ago

We currently apply all relevant normalizations to all artifacts meaning we neither know nor convey to the user which were the minimal set of normalizations that ended up being necessary to achieve an identical artifact.

Determining the minimal set of normalizations could be tricky but i think the best strategy would be to decompose our monolithic normalizer into a set of "passes" and, as we're normalizing each artifact, evaluate the similarity of the artifacts at each step. When passes 'do nothing', the degree of similarity should remain unchanged when the pass is applied.

I don't believe this approach would work for passes applied in a random order but I do think we could order them such that the similarity would monotonically increase.

Credit to @hboutemy for the idea 👍