pypa / twine

Utilities for interacting with PyPI
https://twine.readthedocs.io/
Apache License 2.0
1.61k stars 308 forks source link

test_make_user_agent_string() failing in Debian build #870

Closed stefanor closed 2 years ago

stefanor commented 2 years ago

Your Environment

1) Your operating system: Debian/unstable 2) Version of python you are running: 3.9.10 and 3.10.2. 3) How did you install twine? Building from the sdist. 4) Version of twine you have installed: 3.8.0

The Issue

Please describe the issue that you are experiencing.

    def test_make_user_agent_string(default_repo):
        """Add twine and its dependencies to User-Agent session header."""
        assert "User-Agent" in default_repo.session.headers

        user_agent = default_repo.session.headers["User-Agent"]
        packages = (
            "twine/",
            "requests/",
            "requests-toolbelt/",
            "pkginfo/",
            "importlib_metadata/",
        )
        for p in packages:
>           assert p in user_agent
E           AssertionError: assert 'importlib_metadata/' in 'twine/3.8.0 colorama/0.4.4 importlib-metadata/4.6.4 keyring/23.5.0 pkginfo/1.8.2 readme-renderer/24.0 requests-toolbelt/0.9.1 requests/2.25.1 rfc3986/1.5.0 tqdm/4.57.0 urllib3/1.26.5 CPython/3.9.10'

On a sailboat with terrible Internet connectivity right now, I don't have the bandwidth to dig around. But:

Not sure exactly what changed here to cause importlib-metadata to not become importlib_metadata It seems that the change to list_dependencies_and_versions() in #858 expected importlib_metadata and that isn't being delivered in this run.

Maybe the difference is that in my environment importlib_metadata's metadata is in egg-info form, not dist-info form.

Steps to Reproduce

If the issue is predictable and consistently reproducible, please list the steps here.

  1. Run the tests with python3-importlib-metadata from the Debian archive.
bhrutledge commented 2 years ago

@stefanor It's not clear to me why you're trying to run the tests in the environment that you described, or why you've even got an environment set up that way. However, it does sound like it's deviating from what I consider to be Twine's "standard" development/test environment, i.e. virtual environments, managed by tox, where Twine and its dependencies are installed via pip install -e .. Unless there's a compelling reason, I don't think we'll do much to support other setups.

That said, I'd be surprised if #858 is the root cause, because it doesn't change anything related to the version of importlib_metadata.

sigmavirus24 commented 2 years ago

I suspect that this is because importlib_metadata isn't going through the right processes to normalize the module name in PKGINFO. We rely on that to generate the user-agent string. If that isn't generated properly, I don't think there's anything for us to do.

stefanor commented 2 years ago

It's not clear to me why you're trying to run the tests in the environment that you described

We (Debian) like to run the test-suites of packages when we build them (and when any of their dependencies change), so that we can be sure that we're shipping something that's working.

or why you've even got an environment set up that way

importlib_metadata is currently built with setup.py in Debian. That will probably change in the future, but that's how things stand right now.

That said, I'd be surprised if https://github.com/pypa/twine/pull/858 is the root cause

Now that I'm off an island with better cell coverage, and can play around, it looks simpler than I thought:

$ python3 -m venv testve
$ testve/bin/python -m pip install wheel
$ testve/bin/python -m pip install twine
...
Successfully installed Pygments-2.11.2 SecretStorage-3.3.1 bleach-4.1.0 certifi-2021.10.8 cffi-1.15.0 charset-normalizer-2.0.11 colorama-0.4.4 cryptography-36.0.1 docutils-0.18.1 idna-3.3 importlib-metadata-4.10.1 jeepney-0.7.1 keyring-23.5.0 packaging-21.3 pkginfo-1.8.2 pycparser-2.21 pyparsing-3.0.7 readme-renderer-32.0 requests-2.27.1 requests-toolbelt-0.9.1 rfc3986-2.0.0 six-1.16.0 tqdm-4.62.3 twine-3.8.0 urllib3-1.26.8 webencodings-0.5.1 zipp-3.7.0
$ testve/bin/python -m twine --version
twine version 3.8.0 (pkginfo: 1.8.2, readme-renderer: 32.0, requests: 2.27.1, requests-
toolbelt: 0.9.1, urllib3: 1.26.8, tqdm: 4.62.3, importlib-metadata: 4.10.1, keyring: 23.5.0,
rfc3986: 2.0.0, colorama: 0.4.4)
$ testve/bin/python -m pip install twine==3.7.1
$ testve/bin/python -m twine --version
twine version 3.7.1 (importlib_metadata: 4.10.1, pkginfo: 1.8.2, requests: 2.27.1, requests-
toolbelt: 0.9.1, tqdm: 4.62.3)

That's a pretty standard setup, and the name changed, I'd assume that's #858.

How much do you care about the User-Agent format?

bhrutledge commented 2 years ago

Ah, I see that I misspoke; #858 clearly changes from an explicit list of package names to using Twine's metadata, which could result in the observed behavior (which I was able to reproduce from the previous comment). Given that, I don't understand why the test is passing in Twine's CI, and I don't know how to reproduce the failing test.

How much do you care about the User-Agent format?

Not very much, I think.

sigmavirus24 commented 2 years ago

How much do you care about the User-Agent format?

PyPI can parse it and use it for the purpose of debugging why a request went bad. Since PyPI can parse both importlib_metadata and it's normalized importlib-metadata as the same thing by normalizing them, it shouldn't matter. I doubt other barely supported package repositories care about the U-A string.

sigmavirus24 commented 2 years ago

I'll also point out that we were sending a static, constrained list of dependency names and versions. Now we seem to be including things that aren't strictly necessary to send which means that from a privacy stand-point we're potentially sending more information than a user might be comfortable (given users are probably already uncomfortable that we're sending packages and versions in the U-A).

Would it be plausible to filter what we send to PyPI to what it would need to understand why a thing failed? I don't imagine PyPI should care that we're using readme-renderer, bleach, SecretStorage, keyring, colorama, etc. It might care about requests, requests-toolbelt, urllib3, pkginfo, and importlib-metadata though

sigmavirus24 commented 2 years ago

We probably also want to see if PyPI even uses any of this and if it's worth sending in the U-A. When I added it, I had talked to Donald about the potential value but I don't know if it's ever been realized

stefanor commented 2 years ago

After a little more playing around:

From what I can tell, the difference in behaviour is that when tox runs it has the twine sources in the current working directory, and prefers those to the twine module installed in the venv. That means it is reading twine's .egg-info instead of its .dist-info, and getting the non-normalized module name.

So, I'd suggest that the test is checking for the wrong string, but as noted above, this isn't a big deal.

For the moment, I'm carrying a patch in Debian to change the checked string.

bhrutledge commented 2 years ago

I'll also point out that we were sending a static, constrained list of dependency names and versions. Would it be plausible to filter what we send to PyPI to what it would need to understand why a thing failed? We probably also want to see if PyPI even uses any of this and if it's worth sending in the U-A.

This all sounds reasonable to me, so I'm reopening this issue.

stratakis commented 2 years ago

We also got the same issue on Fedora while trying to build the latest version as an rpm and running the tests.

bhrutledge commented 2 years ago

We probably also want to see if PyPI even uses any of this and if it's worth sending in the U-A. When I added it, I had talked to Donald about the potential value but I don't know if it's ever been realized

@di Before I go searching the warehouse code, do you have a quick answer to whether or not PyPI uses the packages and versions that Twine is sending in the User-Agent string?

bhrutledge commented 2 years ago

A code search in warehouse seems to indicate that User-Agent is just used to set the uploaded_via field on the File and Release models, which only seems to be used in the metadata for BigQuery.

So, it seems fine to remove the dependencies from the User-Agent string:

https://github.com/pypa/twine/blob/133f8ae254fe51b4f75a59b101efa413adc4cc8d/twine/repository.py#L93-L99