Closed johnthagen closed 3 years ago
@manuzhang I added a "failing" test page that should generate a warning but currently does not.
Unlike c637c44531e5d654581721a04c10bf28cae141a8 we can't use the soup
that is passed into get_url_status
because that is for the current page, not the one that is being linked to.
It seems like we would need to resolve the URL (e.g. index.html#BAD_ANCHOR
) and then parse the destination using BeautifulSoup to verify if the anchor is correct. This seems challenging given the user could be running mkdocs build
rather than mkdocs serve
so there wouldn't be something that could be queried directly using requests
.
Perhaps the on_post_build()
hook could be used somehow.
If in our current page we have a link such as:
[link](index.md#elephant)
And index.md
has a header such as:
# Elephant
We need to search the built target index.md
page for
<a href="#elephant" class="nav-link">Elephant</a>
Another option to consider is that technically we could try to parse the target markdown source file rather than trying to locate and query the actual built HTML.
@manuzhang I have implemented a method that finds the target Markdown source and validates it contains a Markdown header for the cross-page anchor. Could you give this a review and tell me what you think?
This still needs a bit more work handling headers with multiple words separated by spaces.
Here are GitLab's rules for how Markdown headers are produced: https://stackoverflow.com/a/43276249 MkDocs seems to do something similar. It may be that trying to go backwards from URL anchor to Markdown header is a bit too complex.
@johnthagen thanks for the continuous investigation on this task. I might only have time to check and test during weekend.
@manuzhang No problem. This issue may end up being very difficult to address, so we may have to abandon it or perhaps you or someone else will come along with a better way to solve this issue than I have been able to come up with.
Another idea could be to try to use MkDocs or python-markdown
's actual functionality to slugify identified Markdown headers and then compare them with the anchor being checked:
Heading checking seems to be working, but testing this on a larger project revealed that find_source_markdown()
needs more work. It doesn't handled nested markdown files that have relative links between them.
@manuzhang This is ready to review now. I tried out this PR on a large MkDocs project I maintain and it found two true errors in anchors that otherwise would not have been detected. It did not have any false positives in my project either.
@johnthagen Thanks for the nice work !
@manuzhang Sure! I think it would be good to cut a new release with this feature included for people to try out.
Closes #18