pypa / readme_renderer

Safely render long_description/README files in Warehouse
Apache License 2.0
158 stars 88 forks source link

Resolve relative links in long_description #163

Open protolambda opened 4 years ago

protolambda commented 4 years ago

What's the problem this feature will solve?

Broken relative links in many pypi hosted project descriptions.

Describe the solution you'd like

Add a project maintainer option that sets what relative links should be resolved to.

A similar feature is available in html itself: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/base

Even better would be if we can use some properties of the package such as version to describe the path to resolve relative links with.

E.g. github.com/example/foobar/tree/v$VERSION configured as base path.

Additional context

A common use-case is to take your project readme, and load it into the package long-description. Often this is done with:

with open("README.md", "rt", encoding="utf8") as f:
    readme = f.read()
...
    long_description=readme,
    long_description_content_type="text/markdown",

This works, but relative links break. And hardcoding is a poor practice too, as e.g. GitHub-hosted Readme links become stuck to a certain version, instead of referencing the resource on the same version the viewer is on. Also, sometimes a resource moves, and to keep older documentation working some ability to point to a different absolute path would be great.

PyPi as a viewer of the description can resolve these relative links. And maybe even default to the home page (which lots of projects set to their github project link), to un-break thousands of broken relative description links currently in PyPi.

Mattwmaster58 commented 4 years ago

Here's an example of a project with this issue: https://pypi.org/project/black/

protolambda commented 4 years ago

If it helps, one of the bigger projects I work on has a specs-repository that enables all code snippets to be used as a package. Because the project is so documentation heavy, the local links are all relative, and the readme all broken. https://pypi.org/project/eth2spec/

Another alternative to implement a solution, without the need for a pypi project setting of some kind, is to read the setting from the readme itself. If pypi can read some markdown comment like <-- pypi: relative_base=https://github.com/foobar/example/ --> then links can be made to work.

But better would be to fix all those existing broken links and support it with a pypi project setting.

di commented 4 years ago

I think we could probably do this by supporting some custom front-matter for Markdown descriptions.

We'd want to be able to support the same functionality for rST-formatted descriptions as well though, and I'm not familiar enough with rST to say how to do that.

jamadden commented 4 years ago

We'd want to be able to support the same functionality for rST-formatted descriptions as well though, and I'm not familiar enough with rST to say how to do that.

The Nikola static blog engine supports extensive metadata in both markdown and reST. While it can just use specially formatted comments, it can also use the reST standard "docinfo" nodes, which is what I personally use:

How to make money
=================

:slug: how-to-make-money
:date: 2012-09-15 19:52:05 UTC

These have the advantage of being parsed by docutils itself; there are a set of standard tags, and arbitrary custom tags can be used as well. Getting the data is easy:

        meta = {}
        if 'title' in document:
            meta['title'] = document['title']
        for docinfo in document.traverse(docutils.nodes.docinfo):
            for element in docinfo.children:
                if element.tagname == 'field':  # custom fields (e.g. summary)
                    name_elem, body_elem = element.children
                    name = name_elem.astext()
                    value = body_elem.astext()
                elif element.tagname == 'authors':  # author list
                    name = element.tagname
                    value = [element.astext() for element in element.children]
                else:  # standard fields (e.g. address)
                    name = element.tagname
                    value = element.astext()
                name = name.lower()

                meta[name] = value

If tools are unaware of custom tags, they format in HTML as a field list (basically a table). (This is opposed to a custom directive, which can do anything from produce an error in the rendered document to nothing at all, depending on settings.) Nikola strips the docinfo nodes so they don't show up in rendering, which is pretty easy:

        for node in self.document.traverse(docutils.nodes.docinfo):
            node.parent.remove(node)
miketheman commented 2 years ago

Related to #71

chamini2 commented 1 year ago

Tried to use the HTML base approach and the tag is instead printed in the readme. (https://pypi.org/project/fal/0.8.1/) image

Can we enable base HTML tag for this approach to work manually at least?

solaluset commented 1 year ago

As a workaround, it's possible to modify links dynamically with regex:

GITHUB_URL = "https://github.com/your/repo"
long_description = open("README.md").read()
# links on PyPI should have absolute URLs
long_description = re.sub(
    r"(\[[^\]]+\]\()((?!https?:)[^\)]+)(\))",
    lambda m: m.group(1) + GITHUB_URL + "/blob/master/" + m.group(2) + m.group(3),
    long_description,
)
zcutlip commented 9 months ago

As a workaround, it's possible to modify links dynamically with regex:

Interestingly this regex also work for links to directories. In this case GitHub sees you're asking for a directory in the tree and helpfully redirects you from blob/main/directory-name/ to tree/main/directory-name/

Just thought I'd share in case someone stumbles upon this solution with a similar need.