wummel / linkchecker

check links in web documents or full websites
http://wummel.github.io/linkchecker/
GNU General Public License v2.0
1.42k stars 234 forks source link

Feature request: check section links in URLs #269

Closed wummel closed 11 years ago

wummel commented 11 years ago

Converted from SourceForge issue 3200109, submitted by gwern

I link to Wikipedia hundreds of times on my personal website, http://www.gwern.net/ . Often I make clearer how I am using it as a reference or what part the reader ought to be reading by pointing to a specify section in an article. (Perhaps I need to link to http://en.wikipedia.org/wiki/List_of_Nadia:_The_Secret_of_Blue_Water_characters#Nadia or to http://en.wikipedia.org/wiki/Sleep#Sleep_stages)

However, Wikipedia (and probably lots of other sites) sometimes changes the naming of the sections. The link itself doesn't necessarily break - if the section in the Sleep example was renamed 'stages of sleep', the URL would still return content and not 404s. But from my perspective, the link has broken.

The FAQ and man page do not seem to mention any section checking, and over the half year I've used linkchecker, I don't recall it reporting any broken section links (even though as I said, Wikipedia articles are always changing and I use them all the time). So I infer that linkchecker currently isn't doing any checking.

So, it'd be useful to if linkchecker could check whether the identifier after the # has a corresponding div with that identifier in the HTML source.

(One might want to make it an option. Since the # being broken doesn't cause any high-level issues with the browser, I've seen it abused to make URLs unique; for example, to resubmit URLs to Reddit or Hacker News. There are probably other hacks whose presence one wouldn't want to be warned about.)

wummel commented 11 years ago

Submitted by calvin

Linkchecker supports checking sections, although it calls them anchors. Use the --anchors command line option or set anchors=1 in the [checking] section of the configuration file. You'll get warnings for all invalid anchors.