scientific-python / summit-2024

1 stars 0 forks source link

SPEC-8: Supply-Chain Security #9

Open matthewfeickert opened 2 months ago

matthewfeickert commented 2 months ago

Cross reference with https://discuss.scientific-python.org/t/spec-8-supply-chain-security/1163

Copying from @tupui's original post there, areas of focus could be:

drammock commented 2 months ago

@larsoner and I went through the process of getting OpenSSF "silver" for MNE-Python last year, so feel free to bug one of us if you encounter bits of the process that don't make sense, or just want to copy our solutions

matthewfeickert commented 2 months ago

Things that I'm the most interested in are:

betatim commented 1 month ago

I am interested in this topic, mostly in terms of coming up with a few "good bang for buck" recommendations. For example trusted publishers looks like one of these things. Reducing the number of accounts with elevated access seems like another such thing. Requiring 2FA. Managing dependencies (reducing the number of them, assessing their trustworthiness, etc). Social infrastructure against scam tactics. There are probably a few more such "low hanging fruits" that could form a guide of concrete actions projects can take, aimed at the casual maintainer, not those already much deeper into the subject.

I've taken a few minutes to look at SLSA and OpenSSf and stopped reading after a few minutes of "standard this", "attestation that", "certify blah". It seemed more like checklists that you can use to see how you are doing but not really giving you advice what to do.

drammock commented 1 month ago

I've taken a few minutes to look at SLSA and OpenSSf and stopped reading after a few minutes...

You're right that a lot of openssf is like that. The most concrete change we made was adding an action that uses bandit to check for security faux pas. It caught some things. With or without openssf I'd recommend it

tupui commented 1 month ago

Some of these security measures are also meant to improve the trust and transparency in the build/distribution process.

Some things might seems like just badges like the SLSA action. But what it provides users with is the insurance that a certain artifact comes from a specific run they can audit. All in all, adding all these small things really bumps the whole system's security as it's becomes very hard for a bad actor to compromise everything. Because audit solutions would flag any inconsistency.

tupui commented 1 month ago

In terms of scope, we also need to keep in mind the SPEC 6. It notably talks about permissions, passwords, tokens, 2FA and SSH keys.

We could merge the outcome to this workshop into this SPEC or make a new one as I propose here.

Carreau commented 1 month ago

I also think one of the key piece is https://reproducible-builds.org/ I'm doing it for IPython. It's nice to have things signed , but if a different machine can reproduce the same artifacts, it's even better than signing.

tupui commented 1 month ago

I see it as complementary:

Signing: who SLSA, trusted publisher: provenance, distribution Reproducible build: can replicate

Carreau commented 1 month ago

Reproducible build: can replicate

I think it's more than just replicate, it's "actually comes from the source", even if you mostly don't trust most of the people that try to reproduce the build. With that it is sufficient for the release commit to be signed, you can derive the validity of the artifacts.

I agree it's complementary, but in the case the publisher/distributor is compromised, without reproducibility, it's hard/impossible to audit (well in case of a pure python wheel, ok, you just need to dezip).

Another big difference is that reproducibility is binary/factual, while signing does bring into the fold the question of trust in addition to valid signature (as well as revocation, expiration and distribution of keys).

tupui commented 1 month ago

Yep but still not a silver bullet. If the sources are compromised to start with right before cutting the release then reproducibility is also not covering it all. You will reproduce the faulty artifact.

I do like reproducibility and think it should be part of the "package". I just think we need complementary measures at least for cases where it's difficult to make it happen. Like a very exotic build infra.

sethmlarson commented 1 month ago

This is excellent! Here's what I would recommend as areas of focus:

I also think one of the key piece is https://reproducible-builds.org/ I'm doing it for IPython.

Reproducible builds are definitely nice to have but are usually outside your control especially when it comes to compiled libraries, you need to have the entire toolchain+dependencies support reproducible builds and have the libraries and their versions identical too. Always happy to see progress in this area though!

Carreau commented 4 weeks ago

Note: IIRC by default git does have transfert.fsckobjects = false by default, so a git repo can get corrupted without dev noticing (to check)

matthewfeickert commented 4 weeks ago

Example attestation: https://github.com/scikit-hep/iminuit/pull/993#issuecomment-2145857828