pypi / warehouse

The Python Package Index
https://pypi.org
Apache License 2.0
3.59k stars 968 forks source link

Rollout plan for critical projects promo #11625

Closed di closed 2 years ago

di commented 2 years ago

The following steps should be followed to roll out the critical projects promo:

Launch

Post-launch (After Oct 1, 2022)

Next steps are in https://github.com/pypi/warehouse/issues/12308.

di commented 2 years ago

There's now a public dashboard for the relevant metrics here: https://p.datadoghq.com/sb/7dc8b3250-389f47d638b967dbb8f7edfd4c46acb1 (h/t @ewdurbin for beautifying this).

davidism commented 2 years ago

Heads up, you sent out an email with http://localhost in the URLs instead of https://pypi.org. This happens in Flask when you don't configure it to know where it is when a request isn't active, such as generating emails; probably similar in Pyramid.

di commented 2 years ago

Thanks for the report, we're working on it 🙂

tomato42 commented 2 years ago

Titan keys are only approved for sale in certain geographic regions, and thus can only be shipped to the following countries: Austria, Belgium, Canada, France, Germany, Italy, Japan, Spain, Switzerland, United Kingdom, and the United States.

Since when Germany, Italy, France, etc. are part of different regulatory regime than the rest of EU?

di commented 2 years ago

@tomato42 Unfortunately this is out of our control, these are the only countries in which Google is able to sell the product, and I don't have an explanation as to why.

mdmintz commented 2 years ago

I received the [PyPI] A project you maintain has been designated as critical, but it would be helpful to know the criteria for that designation. Number of downloads? Number of GitHub Stars? Number of other projects that have my project as a dependency? A combination of the above?

davidism commented 2 years ago

It's also a bit inconsistent. Jinja2 didn't get marked as critical, even though it's the most downloaded of my projects. Flask didn't get marked, but the less used Quart did.

Never mind, I think it's currently limited to some libraries that Warehouse uses, although not sure where Quart came from.

underyx commented 2 years ago

Never mind, I think it's currently limited to some libraries that Warehouse uses, although not sure where Quart came from.

https://pypi.org/project/semgrep/ got marked as critical and it doesn't seem to be used by Warehouse (yet!)

di commented 2 years ago

I received the [PyPI] A project you maintain has been designated as critical, but it would be helpful to know the criteria for that designation. Number of downloads? Number of GitHub Stars? Number of other projects that have my project as a dependency? A combination of the above?

Answers to this and many more questions are included at https://pypi.org/security-key-giveaway/

di commented 2 years ago

It's also a bit inconsistent. Jinja2 didn't get marked as critical, even though it's the most downloaded of my projects. Flask didn't get marked, but the less used Quart did.

This does surprise me, I wonder if we have an issue with name normalization happening.

Never mind, I think it's currently limited to some libraries that Warehouse uses, although not sure where Quart came from.

We expanded it to the top 1% by downloads. The query is here: https://github.com/pypi/warehouse/blob/714babdf83fe3414974a14e1accdae1527cf7473/warehouse/packaging/tasks.py#L42-L57

dstufft commented 2 years ago

Yes, BigQuery stores the names normalized IIRC, that query is using Project.name not Project.normalized_name.

di commented 2 years ago

Yeah, we've only flipped the bit for 3381 projects, this should be >3800. Will address this.

di commented 2 years ago

@davidism https://github.com/pypi/warehouse/pull/11796 should fix this, and the bit should get flipped for these projects in ~8 hours.

tedmiston commented 2 years ago

I am a bit confused by how / what projects are getting marked as critical as well.

One of my projects (https://pypi.org/project/boa-str/) got marked as "critical" is an old, very small and simple string manipulation library last released in 2017. It was basically a small internal dependency made external for convenience. Nowhere near the level of a project like Flask or Jinja... I would be surprised if it had any external users at all, let alone met this criteria from the page linked above:

What determines if project is a critical project?

PyPI determines project eligibility based on download counts derived from PyPI's public dataset of download statistics. Any project in the top 1% of downloads over the prior 6 months is designated as critical.

I tried to access the public BigQuery dataset to run a simple query (below) but got denied running the first query due to free tier quota error.

SELECT COUNT(1)
FROM `bigquery-public-data.pypi.file_downloads`
WHERE
  project = "boa-str"
  AND timestamp >= "2022-01-01"
GROUP BY `project`
LIMIT 10;

The error:

Quota exceeded: Your project exceeded quota for free query bytes scanned. For more information, see https://cloud.google.com/bigquery/docs/troubleshoot-quotas

There is a small possibility that it's still in use as a dependency and e.g., being pulled in some Docker containers running at scale given that it was written for a startup which has grown massively.

Are there other ways to access this data e.g., a JSON export of the 3800 projects to check whether this is a mistake?

alex commented 2 years ago

@tedmiston https://pepy.tech/project/boa-str or https://pypistats.org/packages/boa-str are both good ways to view this data. Looks like it gets quite a bit of downloads.

tedmiston commented 2 years ago

@alex Thank you! It turns out xkcd was right after all.

updates resume

hugovk commented 2 years ago

Are there other ways to access this data e.g., a JSON export of the 3800 projects to check whether this is a mistake?

I expect the top ~3,800 projects (over 6 months) will be somewhat similar to those on monthly list at https://hugovk.github.io/top-pypi-packages/

@davidism Jinja2 is number 36 (75 million monthly downloads) so should be included, and likewise Pillow at 60 (43m). Pillow also isn't currently marked as critical, but both have a capital initial so I expect the normalisation fix will sort that in a few hours 👍

@tedmiston boa-str is at number 1,340 with 878k downloads!

CaselIT commented 2 years ago

There still seem to be some problems with the query. for example sqlalchemy is not marked as critical even if it's both a top 1% project and it's used by warehouse

dstufft commented 2 years ago

The query hasn't re-run yet, it runs once a day.

hugovk commented 2 years ago

Pillow is now marked as critical, and there's the bump from 3.38k to 3.82k critical projects on the dashboard:

image

Thanks!

tedmiston commented 2 years ago

One note from a UX perspective — I enabled 2FA via app preemptively ahead of getting the hardware key. But as soon as one does this, the page at https://pypi.org/security-key-giveaway/, decides you're not eligible for the hardware key. It was trivial to remove it, request the hardware key, and re-enable it, but it would be nice if 3800 of us didn't have to do that 🙃.

Edit: Never mind about getting the order through... it looks like Google is sold out of both keys in the U.S. now. [The USB-C key says in stock on the product page, but out of stock once added to cart. The USB-A key says out of stock on product page.]

ssbarnea commented 2 years ago

I am maintainer of 16 projects marked as critical but I am still not eligible to get a hardware key because I did the right thing and adopted (software based) 2FA previously. That is hilarious.

I am not sure if @tedmiston trick still works but I can see how this program could easily have opposite effect than the desired one.

di commented 2 years ago

Our goal is to get as many people as possible to use 2FA. Our constraint is that we have a limited number of hardware keys to give away.

While I agree that hardware keys should be preferred over TOTP, if you already have 2FA enabled via TOTP, but take a pair of free keys, that potentially means that one less person can enable 2FA.

That said, the discount codes expire Oct 1. If it looks like we'll have a surplus of discount codes by then, I'd support adjusting this policy to allow TOTP users to acquire hardware keys as well.

ssbarnea commented 2 years ago

If someone never used hardware keys, I would recommend them the software approach as its is very easy to stick the TOTP into your prefered password manager or just us one app like google auth. Using a HW token is considerably more inconvenient.

Forcing 2FA is no brainer and I would support even more aggressive rolling methods (1% is quite low). I think that those that fight-it are very few and are in the category that do not give a (dime) about security for users as in the end nobody is excluded from being hacked.

A big thank you to all those that made the 1% group!

di commented 2 years ago

@hugovk, when you got the email for Pillow, did it have HTTP or HTTPS links? I believe it should have been HTTPS and https://github.com/pypi/warehouse/issues/11802 is just a side-effect from us running the task via CLI instead of via cron.

hugovk commented 2 years ago

It had HTTP. The first email was for projects with lowercase names, the second was for Pillow:

image
memsharded commented 2 years ago

Hi, we have also been designated as critical project. We have been automatically deploying/publishing releases to PyPI directly from our CI (running in the cloud), fully automated. It is not clear, or I cannot find how is it possible to achieve this, both the physical key and the authenticator apps seems to work only for manual publishing. Am I missing something? Many thanks!

alex commented 2 years ago

API keys can be used to accomplish this: https://pypi.org/help/#apitoken

memsharded commented 2 years ago

API keys can be used to accomplish this: https://pypi.org/help/#apitoken

This is what we were already using, and it starting failing today, we assumed it was the 2FA being enabled. We have also tried manually enabling 2FA, and it is still failing with "Backend is unhealthy". It might be some temporary issue, we will try again tomorrow and report. Thanks!

alex commented 2 years ago

Backend is unhealthy means the CDN is having trouble talking to the application servers. https://status.python.org/ shows some spikes in error metrics, not sure if that's related. In any event, it's unrelated to 2FA requirements :-)

FirefoxMetzger commented 2 years ago

Hm, perhaps not the right place for this, but would it be useful to display a "critical package" badge on the pypi page, or make a badge for it to add to the repo if desired?

At the moment it mostly feels like another hurdle to jump through when we perform the release dance that happens somewhere deep in dev/maintainer land. I see how it may benefit security in general, but as far as I understand the main reason for this promo is to show that pypi is taking security serious so that users (and downstream packages) can trust their dependencies a bit more. Would be nice to have something to show for that.

ssbarnea commented 2 years ago

@FirefoxMetzger I am in big favor of starting to add badges but it is not so simple. For example, I still find "critical" as misleading because in fact what was used to determine this was the download traffic in last 6 months. I would say that critical is likely to be more related to how many other projects are using, something that pypi cannot yet determine.

For example, I would support making public the "Sole Owner" badge as as far as I am concerned that is a security and maintenance risk too as it means "only one person can publish". That persom might go-rogue at some point, or just become permanently unavailable. For me that might be a very good reason for marking a package as risky/problematic in a public way. In fact lack of use of bot accounts with tokens for uploading packages is another red flag but that is currently close to impossible to determine by pypi. Still, let's open a discussion thread as this issue is not the right place to discuss these.

AFAIK, nobody should ever publish packages using personal credentials. The only exception is when you bootstrap a new project, so you reserve the namespace, but even this can be done with tokens.

FirefoxMetzger commented 2 years ago

Still, let's open a discussion thread as this issue is not the right place to discuss these.

@ssbarnea Sure, feel free to ping me and I'm happy to chime in.

I would support making public the "Sole Owner" badge as as far as I am concerned that is a security and maintenance risk [...] marking a package as risky/problematic in a public way [...] lack of use of bot accounts with tokens for uploading packages is another red flag [...] nobody should ever publish packages using personal credentials

Those are all very valid points from a security perspective and I agree that those are concerns to keep in mind. At the same time, I doubt that many maintainers are "sole owners" because they want to be, but rather because they haven't yet found others to join them in maintaining the package. In my (perhaps limited) experience, this change usually happens through increased adoption of the package because you'll eventually run into motivated individuals that volunteer to help out. I'm not entirely convinced that more pressure on sole maintainers (in the form of a "this repo is risky to use because there is only one person maintaining it" badge) will help improve the situation.

Instead, I was thinking that a "your project is a critical piece of infra, keep up the good work" badge doesn't cost much, shows appreciation for people spending their free time on this, might encourage sole maintainers to adhere to best practice (you want to live up to the expectation others have of you), and will at worst do nothing. Its also complementary to any crack down actions on packages that could be maintained better (eg., enforced 2FA), so I figured I could at least suggest it :)

di commented 2 years ago

I separated the rollout plan for the 2FA requirement for critical projects into https://github.com/pypi/warehouse/issues/12308.