Open bc75918c-a209-4fa3-b6cf-28cfb7317f76 opened 4 years ago
In Fedora, we run the following check when we build Python documentation:
# Verify that all of the local links work
#
# (we can't check network links, as we shouldn't be making network connections
# within a build. Also, don't bother checking the .txt source files; some
# contain example URLs, which don't work)
linkchecker \
--ignore-url=^mailto: --ignore-url=^http --ignore-url=^ftp \
--ignore-url=.txt\$ --no-warnings \
Doc/build/html/index.html
From time to time, it discovers broken links:
https://github.com/python/cpython/pull/15700 https://github.com/python/cpython/pull/20383 https://github.com/python/cpython/pull/20388
It would be really nice if this check run as part of the CI that builds the documentation.
Side note: linkchecker can be installed via pip, but the released version is not Python 3 compatible. In Fedora, we package it from git.
Note: I would gladly contribute this check, but I have no idea where should I do that.
On Thu, May 28, 2020 at 3:13 PM Miro Hrončok \report@bugs.python.org\ wrote:
Note: I would gladly contribute this check, but I have no idea where should I do that.
I don't know either. I suspect it will have to be with one of the CI/CD providers that cpython uses.
I _think_ it uses three: a. Travis cpython/.travis.yml b. Github Actions .github/workflows/doc.yml c. Azures Pipelines .azure-pipelines/docs-steps.yml
Beyond that no idea. I fear I am also blind here. Still google is my friend.
Some high-level questions to consider:
Is it run only when a build of the docs is started? Or should it be done regularly (daily/weekly?) to keep an eye on links so that it's not a surprise when build time comes along?
Does a broken link stop the build, or is it just advisory?
Who sees the results? Are they emailed to someone? A mailing list? Posted somewhere publicly?
Is someone assigned responsibility for acting on the failures?
What counts as a failure? Is a 301 redirect OK? It seems that a 301 might be OK to pass, but someone should know about it to update to the new URL.
I am not familiar with the current documentation build process, so forgive me if these are already answered somehow. I'm not looking for answers myself, but providing suggestions.
I think our CI checks already take too long to run and use possibly more than our fair share of global open source resources (provided by GitHub, Travis, MS Azure) especially considering how infrequently you would expect to find a problem and the low severity of missing one immediately. I think a more appropriate choice would be to set up a buildbot to do such a check, perhaps weekly is often enough, not more than daily.
Julien, what do you think?
Something rebuilds the online docs once a day. That same something might be appropriate for running a link checker (including external links) once a week, say.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at = None created_at =
labels = ['3.10', 'docs']
title = 'RFE: Run linkchecker on documentation on the CI'
updated_at =
user = 'https://github.com/hroncok'
```
bugs.python.org fields:
```python
activity =
actor = 'terry.reedy'
assignee = 'docs@python'
closed = False
closed_date = None
closer = None
components = ['Documentation']
creation =
creator = 'hroncok'
dependencies = []
files = []
hgrepos = []
issue_num = 40770
keywords = []
message_count = 7.0
messages = ['369892', '369893', '370196', '370202', '370226', '370270', '370353']
nosy_count = 8.0
nosy_names = ['terry.reedy', 'vstinner', 'ned.deily', 'docs@python', 'mdk', 'hroncok', 'amaajemyfren', 'petdance']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = None
url = 'https://bugs.python.org/issue40770'
versions = ['Python 3.10']
```