pypi / warehouse

The Python Package Index
https://pypi.org
Apache License 2.0
3.54k stars 952 forks source link

Roadmap update for TUF support #5247

Closed LucidOne closed 2 years ago

LucidOne commented 5 years ago

Is it possible to get an update on the development roadmap about when TUF or other encryption support might be deployed? Thanks!

Also, it appears that tomorrow will be 6 years since January 5, 2013.

nealmcb commented 5 years ago

Thanks - good question. I note that TUF is one of the options noted under Add support for API keys ยท Issue #994 ยท pypa/warehouse which appears under the "Cool but not urgent" milestone. I agree that TUF addresses many of the nicely described PyPI-specific concerns that @dstufft wrote way back on 23 July 2013: Why Package Signing is not the Holy Grail

LucidOne commented 5 years ago

I believe that the package signing issues stated can be resolved and in 2019 our internet depends on the security of infrastructure such as PyPI. If there is not a roadmap for TUF support, I'm going to look at solving the problem.

MyNameIsCosmo commented 5 years ago

Perhaps we can evaluate integration with third-party services for verification? Keybase.io helps create trust by building a verified profile attached to a GPG key across multiple services. GitHub handles gpg signature verification for commits, refs, releases, etc. (although, your GH pgp key is kept in your GH settings... 2fa could help mitigate unauthorized account access) Falling back, you can have a PGP key on MIT's PGP keybase which forwards a public key to key servers around the world.

Now, this does assume putting trust in third-party services themselves (and the author of the actual PGP key), but the chances are very slim that your PGP key would be replaced on multiple services at once. The biggest issue is an author being unable to secure their private key (no password, weak password, bad private key handling, etc...), or their account becoming compromised (e.g. GitHub).

This doesn't solve the problem on PyPi's side of verifying someone, although PyPi shouldn't have to verify anyone. PyPi should host the package. It should display information about the package being signed (or unsigned), the package's origins and contents, and whoever is downloading the package should check that key against their trusted sources. In the past, PyPi has hosted malicious code. Of course, this happens. It is expected. Other package managers suffer as well.


I'm sure we all can go on about this. Sure, PGP signing doesn't guarantee security. It doesn't guarantee verification. It doesn't even guarantee safe, robust, reliable, ... code. PGP is just a part of a solution to of a complex problem. As signing becomes supported by major services (package maintainers, source code repositories, document services, ...), a trust network (keybase, key servers, ...) can be utilized to help security-minded users track where their packages are coming from.

If a user or a maintainer doesn't care about signing, it doesn't have to be enforced at the beginning, If the feature is there, people will use it.

Edit: A great talk - How Much Do You Trust That Package?

LucidOne commented 5 years ago

I am convinced that this problem is tractable and I think the key here is that EigenTrust can provide us a mathematical model to get started.

We already have some data in the git commit history. When someone signs a git commit there is an eigentrust metric that can be calculated for all of the previous signed commits. There is reason to believe that causality and techniques like Bayesian inference may also be useful here where multiple commits are signed by a key we do not yet trust, but are temporally followed by a key we do trust. We can also federate our memories of git history and checksums to detect anomalies and attacks on infrastructure.

Sites like github and sr.ht can already provide trust metrics by validating that an email address is connected to a GPG key. We can formalize and automate our existing manual heuristics for validating packages.

Further I think we should start thinking about metadata for developers and organizations that produce software. Organizations can maintain a META-DEV repository that contains the PGP signing keys of active developers, key revocation information, and per repository release manager designation. Developers can also maintain a META-DEV repository that contains yaml (or whatever) metadata linking to a developers twitter / blog / instagram / keybase / identi.ca / matrix / Mastodon / xmpp / whatever where the PGP fingerprint can be posted.

Perhaps we need to start building keyrings in OS native packaging formats (.deb, .rpm, etc) so that trust can at least be established for the most critical python packages.

This is a complex problem but we need to take what small steps forward that we can instead of waiting another year to secure PyPI or even bother to figure out certificate pinning.

brainwane commented 5 years ago

Thank you to everyone who's raised this issue and shared their thoughts and useful resources! And sorry for the slow response!

Short answer: we'll be discussing TUF & Warehouse much more in April.

Longer answer:

The folks working on Warehouse have gotten funding to concentrate on improving Warehouse's security, and have kicked off work (funded by the Open Technology Fund) towards multi-factor auth, API keys, and an audit trail. And -- to quote the blog post --

Facebook... has provided the Python Software Foundation with a monetary gift that will be used to fund the development and deployment of enhanced security features to PyPI....

The PSF Packaging Working Group plans to use these funds to implement highly requested security features in PyPI such as cryptographic signing and verification of files uploaded and installed from the index. Additionally, systems for the automated detection of malicious uploads will lower the time to response and improve the resiliency of PyPI against attacks such as "pytosquatting".

This work will be undertaken in the second half of 2019 but planning will begin in the second quarter of the year.

We anticipate that in mid-April (so, basically within about a month) we'll be announcing a formal Request For Information to ask people to tell us about their interest in being contracted to do this work, and that part of that discussion will be further, more detailed conversations about whether TUF is the right tool for this job. So please watch for that, on this issue and on https://discuss.python.org/c/packaging .

(cc @pradyunsg since I think you're interested in this.)

JustinCappos commented 5 years ago

From the TUF side, we're very interested in moving this forward. Let us know what we can do to help!

LucidOne commented 5 years ago

https://motherboard.vice.com/en_us/article/pan9wn/hackers-hijacked-asus-software-updates-to-install-backdoors-on-thousands-of-computers

trishankatdatadog commented 5 years ago

Same, happy to help with this, just let us know how.

westurner commented 5 years ago

"PyPI security work: multifactor auth progress & help needed" https://discuss.python.org/t/pypi-security-work-multifactor-auth-progress-help-needed/1042/10

brainwane commented 5 years ago

At PyCon sprints several people spoke about the potential future of TUF in Warehouse and Python packaging, and put notes at https://docs.google.com/document/d/1Wz2-ECkicJgAmQDxMFivWmU2ZunKvPZ2UfQ59zDGj7g/ .

trishankatdatadog commented 5 years ago

Thanks, Sumana!

I'm happy to help with the design and coding for this project @lukpueh @awwad

ofek commented 5 years ago

I'll also devote whatever time is necessary to get this done

lukpueh commented 5 years ago

Thanks for putting together our notes, @brainwane! It was a pleasure meeting you guys at PyCon. @ewdurbin, any news on the RFI?

@ofek and @trishankatdatadog, your help will be very much appreciated.

brainwane commented 5 years ago

Please check out the newly posted Request for Interest regarding upcoming work implementing cryptographic signing and malware detection on PyPI.

Our current timeline:

Date Milestone
August 28 Request for Information period opens.
September 18 Request for Information period closes.
September 23 Request for Proposal period opens.
October 16 Request for Proposal period closes.
October 29 Date proposals will have received a decision.
November 30 Contracts for accepted proposals should be finalized.
December 2 Contract work commences.

And then we intend to complete the project over a three to five month period, beginning December 2019.

We're hoping to get participation from potential participants and other experts in the discussion forum, especially about implementation questions, including which of the TUF PEPs (if either) to implement!

brainwane commented 4 years ago

See the PSF's new blog post & the open RFP. Later this year, PyPI wants to start:

  • Implementation of PEP 458 once accepted to add integration of The Update Framework to PyPI
  • Development of either a stand alone service or code in the Warehouse codebase to create, sign, serve, and handle caching concerns for TUF metadata
  • Development of necessary code in the Warehouse codebase to integrate TUF metadata and signing

This means we need to move PEP 458 from "Deferred" to "Accepted" status. Per @ewdurbin's guidance, this means we'll need to get PEP 458 revised, as necessary, to pin down specifics, such as key distribution (who, where, how many?) plus any technical choices that TUF leaves up to implementations. To revise PEP 458 and get it accepted, we'll need to collaborate with previous implementers and other experts.

Given the RFP timeline the latest we should get the PEP accepted is 2 December 2019, but I'd much prefer we get it accepted by mid-October.

trishankatdatadog commented 4 years ago

Thanks for the update, @brainwane!

@JustinCappos @lukpueh Ok, so we have our work ahead of us. I have work obligations to meet, but can devote whatever time I can for this. Let's plan ASAP.

ofek commented 4 years ago

Let me know if you need more assistance, I'd be glad to help!

brainwane commented 4 years ago

Now that the PEP is back in Draft status*, I think the next steps are for one or more of the PEP authors to:

@dstufft is now the BDFL-Delegate for this PEP so it'll have to be the other authors (@trishankatdatadog, @vladimir-v-diaz, @JustinCappos) who push this forward. If we want to get any revisions done and get Donald to accept the PEP by mid-October then you should start the steps above in the next couple days, in my opinion.

* the version at python.org needs to be re-generated, but https://github.com/python/peps/pull/1177 was accepted

brainwane commented 4 years ago

A few of us had a chat today and are working to update the PEP (https://github.com/python/peps/pull/1178 is part of that), and one or more of the PEP authors will be reaching out to @ewdurbin with a few questions.

pradyunsg commented 4 years ago

@brainwane FYI - I'm happy to help with implementing functionality on the client side (i.e. pip) when we get to that point.

I think we'd want to create a tracking issue on pip's issue tracker to have implementation related discussions there, after the PEP is accepted (AFAICT how clients interact with TUF-enabled PyPI is covered by the PEP and would be discussed in the discussions on discuss.python.org).

trishankatdatadog commented 4 years ago

Hi @ewdurbin and @dstufft, we have a few questions for which we could use your help:

  1. By "key distribution," do you mean who manages how many keys and where on the PyPI side? Do you also mean how package managers such as pip would determine which keys to trust in the first place?
  2. What is the current deployment process for Warehouse? This will help us determine how to edit the PEP to discuss how to update the TUF-specific code in Warehouse.
  3. What else would you like to see more about, or see changed in the PEP?

Thanks for your time!

trishankatdatadog commented 4 years ago

@ewdurbin @dstufft I have also sent you both an email about a conference call this week, if possible. Thanks!

brainwane commented 4 years ago

Current status: https://github.com/python/peps/pull/1203 is awaiting review from @dstufft to revise PEP 458. After that, there needs to be a discussion on https://discuss.python.org/c/packaging to get the PEP from "Draft" to "Accepted".

In order to make implementation easier, Dustin wants to work towards implementing #726 (removing Test PyPI from our infrastructure will make key stuff far easier). @di will be speaking more on that in the relevant issue soon.

And, starting in December, @ewdurbin will be managing the contractors who will implement TUF on PyPI. Then the first big key ceremony will be in April at PyCon North America -- if you haven't put PyCon on your calendar yet, you probably should! Conference registration will open later this month.

brainwane commented 4 years ago

The TUF team is now responding to Donald's reviews, and has a discuss.python.org thread up where people can discuss the draft PEP and suggest it be accepted.

PyCon 2020 registration is now open. Tickets do sell out, so if you intend to go, you should register soon.

brainwane commented 4 years ago

@ncoghlan is now the sponsor of PEP 458, and @mnm678 is the PEP coauthor leading discussion/responses in the Discourse thread about the PEP. @dstufft is the BDFL-Delegate which means he's the one who will decide whether to formally accept it.

@pradyunsg @ofek @ncoghlan @ewdurbin -- could use your help with some document feedback. Since some conversation about the PEP has included comparing TUF to Transparent Logs approaches like Certificate Transparency, the TUF team has written a comparison: "Compromise detection vs resilience: contrasting Transparent Logs and The Update Framework". They're seeking feedback on that document (does it make sense and feel fair? should there be rows in the comparison matrix that they're currently missing? can you help clarify the bits currently marked as "unclear"?). Soon (in the next few days, I hope?) they'll link to it in the PEP discussion thread on Discourse.

And: PyCon! Please consider joining us at the Packaging Summit on Thursday, April 16th as well as other events during PyCon North America 2020. PyCon offers financial assistance in case you need help with travel, lodging, etc. -- apply by 31 January (the end of this month) if you need it.

ncoghlan commented 4 years ago

Comparison doc generally looks good to me, but I'm not sure I'm the best judge of how well it is conveying information, since I already understand the trade-offs.

The way I'd summarise the overall conclusion is that we consider the two approaches as complementary because an online transparency log ensures the integrity of historical metadata for all packages, while opting in to managing an offline signing key provides protection of both historical and future metadata for publishers that choose to take on that extra responsibility.

(The doc already says essentially that, it just doesn't quite phrase it that way)

mnm678 commented 4 years ago

Thanks for all the feedback! We converted the document to a blog post that is available here.

pradyunsg commented 4 years ago

The blog post looks great to me! Thanks everyone who worked on it! ^.^

It does explain the trade-offs pretty well IMO, as someone who has working knowledge of this topic but does not have the understanding of these specific systems, like @ncoghlan does. :)

graingert commented 4 years ago

the blog seems to imply that it's a good idea to use TUF+TL at the same time

JustinCappos commented 4 years ago

[FYI: I'm CCing the post authors in case they want to weigh in here.]

I'd say that you could get some value by combining them, but you have to weigh if it's more important to 1) have a list of packages released that is somewhat harder to change but have a large potential for damage and no secure way to recover (TL) or 2) be able to restrict damage when there is a compromise and securely recover from such an attack (TUF).

You could use both together (which has higher operational overhead than either separately) and see security benefits. However, I'd certainly argue TUF provides the far more important protections.

I'm not aware of any of the TUF deployments https://theupdateframework.com/adoptions/ which have chosen to also use TL.

On Tue, Feb 11, 2020 at 8:45 AM Thomas Grainger notifications@github.com wrote:

the blog https://ssl.engineering.nyu.edu/blog/2020-02-03-transparent-logs seems to imply that it's a good idea to use TUF+TL at the same time

โ€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pypa/warehouse/issues/5247?email_source=notifications&email_token=AAGROD4RLJUK2FK6Y7WSNPDRCKTWTA5CNFSM4GNHO6P2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELMOTNI#issuecomment-584640949, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGRODYJO6LIGTVC5D6ZFXLRCKTWTANCNFSM4GNHO6PQ .

trishankatdatadog commented 4 years ago

Thanks for pinging us @JustinCappos

@graingert Right, I agree with Justin: for an open source repository like PyPI, using TUF gives you the guarantee you ultimately want: attackers not being able to tamper with packages signed w/ offline keys. We stress in that blog post that TLs don't give you this property out of the box. We recommending using a TL on top of TUF so that you an audit the TUF metadata, if necessary.

ncoghlan commented 4 years ago

If everything was being signed with offline keys, then I also don't think a TL would add much.

However that isn't the situation that we expect to end up in, even if PEP 480 has been accepted and implemented. Instead, we expect the vast majority of packages to be signed by PyPI (with historical metadata protection provided via a TL), and a subset of publishers opting in to managing their own signing keys and gaining the full metadata protection offered by TUF.

trishankatdatadog commented 4 years ago

If everything was being signed with offline keys, then I also don't think a TL would add much.

However that isn't the situation that we expect to end up in, even if PEP 480 has been accepted and implemented. Instead, we expect the vast majority of packages to be signed by PyPI (with historical metadata protection provided via a TL), and a subset of publishers opting in to managing their own signing keys and gaining the full metadata protection offered by TUF.

Right. I should note that only a few publishers (e.g., top 20% packages) need to opt-in to get a lot of bang for the buck (e.g., 80% of downloads). That said, adding a TL to record history wouldn't hurt.

graingert commented 4 years ago

Also TL would be a nice upgrade to the --generate-hashes/--require-hashes workflow as you'd only need to record the latest TL root. And no need for any public key cryptography

trishankatdatadog commented 4 years ago

Also TL would be a nice upgrade to the --generate-hashes/--require-hashes workflow as you'd only need to record the latest TL root. And no need for any public key cryptography

Yes, at the risk of adding new, malicious versions of packages when the TL is compromised.

Also, my understanding is that you'd still need to add the hashes of specific versions of packages. Adding the root of the TL won't solve your problem, because that only tells you that some package got added, not which one.

graingert commented 4 years ago

Only if you repin the TL, same with --generate-hashes

brainwane commented 4 years ago

Hi all -- I think this conversation should probably be on the discussion thread on the PEP itself. Thanks!

brainwane commented 4 years ago

@dstufft wrote today:

It looks like discussion about the actual meat and potatoes of this PEP has petered out. Unless someone has an objection, I intend to accept this PEP on Friday.

brainwane commented 4 years ago

So, PEP 458 got accepted. @woodruffw @ewdurbin it looks like #7488 has not been merged yet -- is it waiting for something? I feel like I heard there was a securesystemslib issue blocking it but I'm not sure...

woodruffw commented 4 years ago

I feel like I heard there was a securesystemslib issue blocking it but I'm not sure...

Yep! The primary blocker on the securesystemslib side is API support for abstract storage backends, which is currently being tracked by https://github.com/secure-systems-lab/securesystemslib/pull/232. Those changes then need to be rolled into TUF, which hasn't begun yet.

The key generation and signing ceremonies also need to take place before #7488 can be fully deployed, although those can happen independently of any TUF/securesystemslib changes once I complete the runbook and provisioning scripts.

ofek commented 4 years ago

@woodruffw Out of curiosity, what is the intended storage backend?

woodruffw commented 4 years ago

Out of curiosity, what is the intended storage backend?

Warehouse currently uses Google Cloud Storage via the S3-compatible API for file storage (i.e., for uploaded packages), and my intent was to use it for TUF's metadata as well.

di commented 4 years ago

via the S3-compatible API

We actually just use the regular Cloud Storage APIs:

https://github.com/pypa/warehouse/blob/630ac09321d93a6867f2b801153f45a90ba50d58/warehouse/gcloud.py#L27

https://github.com/pypa/warehouse/blob/630ac09321d93a6867f2b801153f45a90ba50d58/warehouse/packaging/services.py#L171-L193

woodruffw commented 4 years ago

We actually just use the regular Cloud Storage APIs:

Whoops! My bad. Using the regular APIs should also be fine ๐Ÿ™‚

ofek commented 4 years ago

Ah nice, I also use Google Cloud Storage for storage ๐Ÿ˜„

Does Warehouse run on Kubernetes? https://ofek.dev/csi-gcs/

Many orgs are beginning to use that, one of the most recent (public) ones is EdgeDB https://github.com/edgedb/edgedb-pkg/commit/0bcfeb97ff29834f5e674106254c4de7c0a41d0b

brainwane commented 4 years ago

I see https://github.com/pypa/warehouse/pull/7488#issuecomment-638973770 mentions a few blockers that people are currently working on ("some bugs in the TUF reference implementation, namely missing roledb state when reloading the repository").

A few people are working on those, including @sechkova and @lukpueh and @trishankatdatadog. I'm sure they would welcome help.

jku commented 4 years ago

I've been drafting the pip side of this and have some unsolved issues related to the 'warehouse client contract' -- what exactly does warehouse promise about urls?

The first one is a tuf-warehouse implementation detail that just should be agreed on. The other two are more questions about Warehouse 'client API' in general -- it's easy to hack something that works now but might be very painful in the future when Warehouse architecture changes. The links hopefully contain enough context. I would really appreciate comments on those (either here or in my issues).

nealmcb commented 4 years ago

@jku Thank you for these questions and related issues. They do seem important to pin down, but I'm a bit unclear as to how they fit in to pip, or indeed what the typical flow of "trust" thru pip is. Can you point to the relevant sections of the PEP, or provide a bit more background (here or in the pip issues, as appropriate) on a few typical, relevant flows?

jku commented 4 years ago

Too much Pip details in a Warehouse issue may be off-topic but I'll share my "work-in-progress design doc" (I was a bit hesitant to share as it's really just notes to myself but the first two pages should be readable): https://docs.google.com/document/d/1Xqwoy0z0MyLh9eZHgSftozRB_sxjd1IbkIVtvolVkFI/edit?usp=sharing. The Implementation notes section maybe makes it easier to understand why I think the issues are relevant.

I'll start discussing this in detail with pip community very soon in https://github.com/pypa/pip/issues/8585: I was just hoping the warehouse client contract issues would have solutions in sight before that.

As far as "trust" in pip goes, this is my interpretation of the current implementation:

With TUF the very high level overview is:

The pep goes into more detail but that is not really visible to pip: the TUF client updater component handles the complexity. The major improvement IMO is that an attacker able to modify project index files (e.g. https://pypi.org/simple/django/) in a warehouse server cannot fool clients into downloading/installing malicious URLs: the attacker would need access to the online signing keys as well. The secondary improvement is that even in the case of online signing key compromise, there is a recovery path.

brainwane commented 3 years ago

Via @ewdurbin: per this tweet and this blog post,

On Friday October 30th at 11:15 AM EDT the Python Software Foundation will be live streaming a remote key generation and signing ceremony to bootstrap The Update Framework for The Python Package Index.