pypi / warehouse

The Python Package Index
https://pypi.org
Apache License 2.0
3.61k stars 968 forks source link

Add support for github badges when they are relatively linked #12886

Open zackees opened 1 year ago

zackees commented 1 year ago

What's the problem this feature will solve?

Pypi does not handle relative links to badges for github pages such as:

[![Ubuntu_Tests](../../actions/workflows/push_ubuntu.yml/badge.svg)](../../actions/workflows/push_ubuntu.yml)

and will display them as broken links.

Describe the solution you'd like

If the user has a github url account, and the relative links can substituted for absolute paths, then convert the urls.

Here is my current work around to convert relative links to absolute links when I upload a package.

https://github.com/zackees/playaudio/blob/main/setup.py

di commented 1 year ago

If the user has a github url account

Can you provide more detail about what you mean by this?

zackees commented 1 year ago

Sure.

Let's start off with the broken behavior due to relative links: https://pypi.org/project/template-python-cmd/

Notice all the links are 404'd and broken. This happens a lot. And not just with badges either, but any relative link in the readme file could be broken when it lands on pypi.

Why not just use absolute paths?

Because when a repo is forked, the resources with absolute paths will point back to the parent fork, not the child. Relative links fix it so that links will point to the current fork because, well, they are relative.

However, when said package is uploaded to pypi, the relative links will now refer to the pypi url and this results in broken 404 links.

So as a work around, users will have to create upload code in setup.py that ingests the readme file, finds the relative links and resolves them to absolute links before being passed to pypi. Then when they land on pypi they will correctly point back to the github url because they are absolute links.

For example:

[![MacOS_Tests](../../actions/workflows/push_macos.yml/badge.svg)](../../actions/workflows/push_macos.yml)

Will be processed by my setup script so that it resolves into:

[![MacOS_Tests](https://github.com/zackees/playaudio/actions/workflows/push_macos.yml/badge.svg)](https://github.com/zackees/playaudio/actions/workflows/push_macos.yml)

Before being sent to pypi. This is all done on the fly. I can't ingest a raw readme file anymore which means that I can no longer use pyproject.toml by itself. I must now use setup.py + pyproject.toml together in order to get the readme file resolved before the package is uploaded.

So obviously there is friction here. Users should use relative links everywhere in github. But now all their projects need special code so that they can be uploaded to pypi. Many users won't do this. The fix instead can be applied serverside at pypi. And this would make sense to a lot of people, because relative links don't mean anything when they land on pypi. It could also theoretically be a security vulnerability.

It would only apply if:

  1. relative links are provided in the uploaded package
  2. and there is a project url provided
  3. and the project url + relative url resolves to a resource that can be found (200 vs 404)

This wouldn't just apply to badges but all relative links on the readme (ie long_description). This will likely fix a majority of broken links now found on pypi, and it's easy to implement.

di commented 1 year ago

Sorry, I was asking what you meant by "If the user has a github url account" (see the quoted text). I think based on your explanation, you mean that there's a project URL for a GitHub repo, correct?

One issue is that even if the project URL is a GitHub repo, assuming all relative links should go to GitHub might be too big of an assumption. They might be links to files within the source tree, for example, or they could be typos.

dstufft commented 1 year ago

I'm personally not a big fan of the idea of having to alias some relative URL on the PyPI side, we don't modify the data that people send to us and this doesn't feel like a big enough thing to break that assumption for us.

zackees commented 1 year ago

The reason I say github repo url is because a lot of code comes from github so special casing this host could provide the fix for most of the 404 links without it having to be rolled out everywhere (yet).

zackees commented 1 year ago

I'm personally not a big fan of the idea of having to alias some relative URL on the PyPI side, we don't modify the data that people send to us and this doesn't feel like a big enough thing to break that assumption for us.

FYI. Pypi already aliases all the badge icons from github, freezing the state of the badge when the package is uploaded. Do you really want relative links to point to internal (404) pypi resources? Seems like a security risk.

webknjaz commented 1 year ago

The reason I say github repo url is because a lot of code comes from github so special casing this host could provide the fix for most of the 404 links without it having to be rolled out everywhere (yet).

I'd expect that a better fix would be having that linted in Twine, though. Maybe, displaying warnings to the maintainers would also be a good idea. But I agree with Donald — this is not something that should be special-cased. Also, it seems like a bad idea to challenge GitHub's rate limits from the PyPI network, issuing numerous probe queries all the time.