rust-lang / cargo

The Rust package manager
https://doc.rust-lang.org/cargo
Apache License 2.0
12.87k stars 2.43k forks source link

Fixup of repository-relative URLs in READMEs #8434

Open kornelski opened 4 years ago

kornelski commented 4 years ago

Technically, README files are not supposed to contain relative URLs, because the README file itself doesn't have its own base URL after being published.

In practice it's usually expected that relative URLs in READMEs will be rewritten the same way GitHub does it, but that: a) depends on an implementation detail of a proprietary service, b) is a complicated transformation:

https://users.rust-lang.org/t/psa-please-use-absolute-urls-in-crate-readmes/45136

crates-io already has some code to fix relative URLs in READMEs, but it has to assume the README lives at the root of the repository, and that the main branch is called master. It's very hard to do any better after the crate has been published.

The URLs are relative to the position of README inside the repository, but Cargo.toml doesn't contain that information. In monorepos with multiple crates the README may not be in the repository root. Getting that path after a crate is published requires cloning and searching the repository. Before publishing, Cargo could simply check the local git checkout.

I see a few of ways to improve this situation:

nipunn1313 commented 3 years ago

Hi. I implemented #9837 before coming across this

In #9837 - I proposed updating readme_file to be a repo-relative path (a breaking change) in tandem with rust-lang/crates.io#3861 would get crates.io to rewrite the relative links w/o making assumptions of the README being at root.

I would note a few points in my research

One big question is whose responsibility it is to rewrite links in README

Currently, registries own url rewriting, and have some mild varying degree of knowledge of popular Github/Gitlab hosting services.

Move ownership to cargo

Moving that logic into Cargo would have the advantage of centralizing that logic in one place - however it would push more burden onto the publish step of cargo (client side software). Currently Cargo isn't in the business of poking around with contents of readmes. It seems like a sensible option, though would take a bit of effort.

Keep ownership of rewriting in registry

Cargo would need to provide more information to the registry to do the rewriting. Notably the path to the crate within the repo. Could be placed as extra upload metadata alongside readme_file, or in cargo_vcs_info.json. We could have the registries all use the crate @kornelski has available - in order to centralize the logic at least.