ocaml / opam

opam is a source-based package manager. It supports multiple simultaneous compiler installations, flexible package constraints, and a Git-friendly development workflow.
https://opam.ocaml.org
Other
1.24k stars 353 forks source link

Check the original checksums on the fallback archives from Software Heritage #5720

Open kit-ty-kate opened 11 months ago

kit-ty-kate commented 11 months ago

Software Heritage fallbacks added in #4859 adds the ability for opam to fetch archives from Software Heritage.

Currently such archives are (for reasons that escape me [1]) not backups of the original archives but backups of the untarred archives that are re-tarred again later when requested. This way of doing makes it so that archives loose their original checksums and retrieving it in a deterministic manner is close to impossible due to file ordering and metadata having changed.

There is currently a long standing upstream issue that hopes to fix this issue in the medium to long term: https://gitlab.softwareheritage.org/swh/devel/swh-model/-/issues/2430

I personally think we should:

[1]: I’m guessing it’s for space efficiencies, but still...

hannesm commented 11 months ago

thank you for opening this issue. I was not aware that these archives are used as a fallback without verifying the checksum. Would it be possible to guard this behaviour with even another command-line option (i.e. not unsafe-yes, but something like no-checksum-for-software-heritage)? Since the unsafe-yes is AFAICT needed for interactive usage of opam, while I really have no interest in using source code which checksum wasn't verified (I prefer to have a failure on installation in that case).

Thanks a lot.

rjbou commented 11 months ago

On validation, checksums are not checked as they can't be used. It is another mechanism that is in place for SWH fallback. We rely on the swhid given in the opam file to download the archive. That swhid is an unique identifier computed from the content of the archive, and it is given by the maintainer. So when we download the SWH archive, we recompute the swhid on the untarred archive in order to validate it (no corruption).

On the fallback itself, it is possible to disable it using opam option swh-fallback=false.

rjbou commented 10 months ago

Some clarification, after a long discussion :)

There was a misunderstanding on Software heritage usage, and the fallback implemented in opam. The fallback in opam is triggered only and only if there is an swhid already present in the opam file. That swhid was added by a maintainer, usually by computing it from the archive that it used for release. That's why we rely on the swhid present in the opam file (and we check it to be sure that the archive matches the swhid), on opam side, it is safe to use. Opam does not retrieve archives from SWH on its own.

But that safe to use guaranty is today not fully fulfilled: there is no check done on opam repo ci, on publication tools, etc. At the beginning, the Software Heritage & OCaml story contained:

  1. addition of opam repo in SWH
  2. some tooling to ease generating/checking shwids
  3. fallback on opam
  4. addition on opam repo ci for checks & proposals
  5. addition in publication tools

But it was done (and funded) only until point 2. So at the moment, there is no support on opam repo, nor on publication tools. It results on 0 package in opam repo contain a swhid.

Once that said, there is still a strong reliability on repo/maintainer for swhid fallback retrieval: maintainers need to give the good swhid, repos need to check it, and some tooling need to be written to help on these tasks.

Until the opam repo & publication tools are upgraded, we propose to change the default by deactivating the SWH fallback, and to display a note in the case an opam file contains an swhid and the archive is missing to inform that it is possible to enable SWH fallback, at own risk.

kit-ty-kate commented 6 months ago

The software heritage fallback was disabled by default in #5899 so moving this issue off the 2.2 milestone