rust-lang / mdBook

Create book from markdown files. Like Gitbook but implemented in Rust
https://rust-lang.github.io/mdBook/
Mozilla Public License 2.0
17.7k stars 1.61k forks source link

Add a linkchecker #356

Closed projektir closed 4 years ago

projektir commented 7 years ago

A linkchecker is a convenient tool to have for avoiding errors and keeping track of dead links. It is also a nice to have for #308.

The RBE linkchecker and the one used in rust-lang/rust both scan the files line-by-line while applying a regex to find links and check them. I don't know if it'd be OK for us to adopt the rust-lang/rust one? @steveklabnik

I think a good place for this would be mdbook test.

Michael-F-Bryan commented 7 years ago

This would be a good candidate for the plugin system (#163). Ideally, after the rendering stage you'd be able to make a plugin which gets passed the rendered output's location and then checks all the links in all the *.html files it can find.

We're planning on refactoring the current system to make it a lot easier to write your own plugins and renderers.

azerupi commented 7 years ago

I think a good place for this would be mdbook test

Definitely in mdbook test

This would be a good candidate for the plugin system

I agree, this could potentially be written as a plugin in the future :) I emphasised "in the future" because I don't want to stall progress on changes that are coming soon-ish. We don't have a deadline for the plugin system, so if someone wants to contribute a solution right now, I wouldn't want to break their inertia.

However, we can keep this use case in the back of our minds when doing the refactorings, to make it indeed possible to implement this as a plugin later. :)

steveklabnik commented 7 years ago

I've thought about trying to put the rust-lang one on crates.io so others could use it too, to be honest.

On Sat, Jun 24, 2017 at 8:28 AM, Mathieu David notifications@github.com wrote:

I think a good place for this would be mdbook test

Definitely in mdbook test

This would be a good candidate for the plugin system

I agree, this could potentially be written as a plugin in the future :) I emphasised "in the future" because I don't want to stall progress on changes that are coming soon-ish. We don't have a deadline for the plugin system, so if someone wants to contribute a solution right now, I wouldn't want to break their inertia.

However, we can keep this use case in the back of our minds when doing the refactorings, to make it indeed possible to implement this as a plugin later. :)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/azerupi/mdBook/issues/356#issuecomment-310835757, or mute the thread https://github.com/notifications/unsubscribe-auth/AABsigeM6CDM-Jwq5-w1B8JZzZ2VTmMEks5sHQDngaJpZM4OEOma .

budziq commented 7 years ago

Definitely in mdbook test

It might be nice to have it as a warning also on mdbook build stage

I've thought about trying to put the rust-lang one on crates.io so others could use it too, to be honest.

@steveklabnik That would be awesome!

Michael-F-Bryan commented 6 years ago

For anyone interested, I've started playing around with a mdbook-linkcheck backend for checking links. You'll need to install mdbook directly from master and isn't 100% finished yet, but it may be useful for some people.

EDIT: It looks like the tool works, because I've already found my first batch of broken links, rust-lang/rust-by-example#990 :tada:

Example Output

This is the output (when logging very verbosely) when the tool is run over mdbook's user: guide

$ RUST_LOG=mdbook_linkcheck cargo run -- -s ~/Documents/forks/mdBook/book-example
    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
     Running `target/debug/mdbook-linkcheck -s /home/michael/Documents/forks/mdBook/book-example`
 INFO 2018-01-13T13:38:49Z: mdbook_linkcheck: Checking for broken links
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck: Config {
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck:     follow_web_links: false
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck: }
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck: Finding all links
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "http://www.rust-lang.org" in README.md#3
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://github.com/rust-lang-nursery/mdBook" in README.md#7
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://github.com/rust-lang-nursery/mdBook/issues" in README.md#7
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://docs.rs/mdbook/*/mdbook/" in README.md#11
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://www.mozilla.org/MPL/2.0/" in README.md#15
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://crates.io/crates/mdbook" in cli/cli-tool.md#3
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://www.rust-lang.org/" in cli/cli-tool.md#10
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://www.rust-lang.org/downloads.html" in cli/cli-tool.md#10
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://crates.io/" in cli/cli-tool.md#20
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://github.com/rust-lang-nursery/mdBook" in cli/cli-tool.md#27
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "format/summary.html" in cli/init.md#25
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://github.com/rust-lang-nursery/mdBook/issues" in cli/watch.md#26
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://github.com/rust-lang-nursery/mdBook/issues" in cli/serve.md#40
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://doc.rust-lang.org/stable/book/" in cli/test.md#3
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "http://handlebarsjs.com/" in format/theme/theme.md#3
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://github.com/rust-lang-nursery/mdBook/issues" in format/theme/index-hbs.md#90
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://highlightjs.org" in format/theme/syntax-highlighting.md#3
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://github.com/rust-lang-nursery/mdBook/issues" in format/theme/syntax-highlighting.md#56
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://www.mathjax.org/" in format/mathjax.md#3
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "" in format/rust.md#38
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://docs.rs/mdbook" in lib/index.md#11
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://docs.rs/mdbook/*/mdbook/renderer/struct.RenderContext.html" in lib/index.md#33
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "http://www.linfo.org/rule_of_silence.html" in lib/index.md#165
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://github.com/mdinger" in misc/contributors.md#7
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://github.com/kbknapp" in misc/contributors.md#8
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://github.com/steveklabnik" in misc/contributors.md#9
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://github.com/asolove" in misc/contributors.md#10
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://github.com/waynenilsen" in misc/contributors.md#11
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://github.com/funkill" in misc/contributors.md#12
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://github.com/FuGangqiang" in misc/contributors.md#13
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://github.com/Michael-F-Bryan" in misc/contributors.md#14
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::links: Found "https://github.com/cspiegel" in misc/contributors.md#15
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck: Found 32 links
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "http://www.rust-lang.org" in README.md#3
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "http://www.rust-lang.org/"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://github.com/rust-lang-nursery/mdBook" in README.md#7
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://github.com/rust-lang-nursery/mdBook"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://github.com/rust-lang-nursery/mdBook/issues" in README.md#7
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://github.com/rust-lang-nursery/mdBook/issues"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://docs.rs/mdbook/*/mdbook/" in README.md#11
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://docs.rs/mdbook/*/mdbook/"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://www.mozilla.org/MPL/2.0/" in README.md#15
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://www.mozilla.org/MPL/2.0/"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://crates.io/crates/mdbook" in cli/cli-tool.md#3
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://crates.io/crates/mdbook"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://www.rust-lang.org/" in cli/cli-tool.md#10
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://www.rust-lang.org/"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://www.rust-lang.org/downloads.html" in cli/cli-tool.md#10
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://www.rust-lang.org/downloads.html"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://crates.io/" in cli/cli-tool.md#20
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://crates.io/"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://github.com/rust-lang-nursery/mdBook" in cli/cli-tool.md#27
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://github.com/rust-lang-nursery/mdBook"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "format/summary.html" in cli/init.md#25
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Searching for /home/michael/Documents/forks/mdBook/book-example/src/format/summary.md
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://github.com/rust-lang-nursery/mdBook/issues" in cli/watch.md#26
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://github.com/rust-lang-nursery/mdBook/issues"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://github.com/rust-lang-nursery/mdBook/issues" in cli/serve.md#40
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://github.com/rust-lang-nursery/mdBook/issues"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://doc.rust-lang.org/stable/book/" in cli/test.md#3
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://doc.rust-lang.org/stable/book/"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "http://handlebarsjs.com/" in format/theme/theme.md#3
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "http://handlebarsjs.com/"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://github.com/rust-lang-nursery/mdBook/issues" in format/theme/index-hbs.md#90
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://github.com/rust-lang-nursery/mdBook/issues"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://highlightjs.org" in format/theme/syntax-highlighting.md#3
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://highlightjs.org/"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://github.com/rust-lang-nursery/mdBook/issues" in format/theme/syntax-highlighting.md#56
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://github.com/rust-lang-nursery/mdBook/issues"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://www.mathjax.org/" in format/mathjax.md#3
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://www.mathjax.org/"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "" in format/rust.md#38
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck: Error for "" in format/rust.md#38, The link is empty
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://docs.rs/mdbook" in lib/index.md#11
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://docs.rs/mdbook"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://docs.rs/mdbook/*/mdbook/renderer/struct.RenderContext.html" in lib/index.md#33
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://docs.rs/mdbook/*/mdbook/renderer/struct.RenderContext.html"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "http://www.linfo.org/rule_of_silence.html" in lib/index.md#165
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "http://www.linfo.org/rule_of_silence.html"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://github.com/mdinger" in misc/contributors.md#7
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://github.com/mdinger"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://github.com/kbknapp" in misc/contributors.md#8
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://github.com/kbknapp"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://github.com/steveklabnik" in misc/contributors.md#9
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://github.com/steveklabnik"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://github.com/asolove" in misc/contributors.md#10
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://github.com/asolove"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://github.com/waynenilsen" in misc/contributors.md#11
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://github.com/waynenilsen"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://github.com/funkill" in misc/contributors.md#12
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://github.com/funkill"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://github.com/FuGangqiang" in misc/contributors.md#13
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://github.com/FuGangqiang"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://github.com/Michael-F-Bryan" in misc/contributors.md#14
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://github.com/Michael-F-Bryan"
TRACE 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Checking "https://github.com/cspiegel" in misc/contributors.md#15
DEBUG 2018-01-13T13:38:49Z: mdbook_linkcheck::validation: Ignoring "https://github.com/cspiegel"
There were 1 broken links

format/rust.md#38: The link is empty
projektir commented 6 years ago

@Michael-F-Bryan so rust-lang/rust already has a linkchecker, which is the one we originally wanted to pull out and turn into a crate (I'm not sure what that means for plugins). It has some problems, though, that yours doesn't have (for instance, this fix is really needed), but it also does some things yours doesn't (check for absolute paths).

Idk if we want to have these out of sync given that rust-lang/rust's linkchecker would run on every x.py build for all the books that it manages.

Michael-F-Bryan commented 6 years ago

Idk if we want to have these out of sync given that rust-lang/rust's linkchecker would run on every x.py build for all the books that it manages.

My original hopes were that this could supplement (or even succeed?) their link checker, although on further inspection they do a lot of cross-site linking (i.e. using links to files outside the book such as ../../std/prelude/index.html). My linkchecker works purely with the source book and doesn't take into account the fact that other things exist on the Rust S3 bucket, so I don't know whether this is still possible.

That said, the entire idea behind enabling alternate backends is that people can write their own tools to suit their exact use case. For example, it was almost trivial to knock up a backend which runs everything through rust-skeptic, which is something Rust By Example currently need to do manually.

but it also does some things yours doesn't (check for absolute paths).

This part was tricky. I originally treated relative and absolute paths separately (relative links are relative to the chapter's directory, absolute is relative to src/) but found that most of the links in Rust By Example used a completely different convention. We use the <base> tag to tweak how links get resolved by your browser, so what I detected as a "broken link" turned out to still work fine when viewing online.

Michael-F-Bryan commented 4 years ago

@ehuss, the mdbook-linkcheck backend already exists and does a pretty good job.

Are we happy with letting the ecosystem provide a linkchecker instead of making it part of mdbook?

ehuss commented 4 years ago

I think that's up to you! It seems unlikely that anyone is going to develop a new link checker. If you're asking if you want to migrate mdbook-linkcheck as a built-in, I think that's also up to you. Or if you're asking if this issue should just be closed, I'm fine with that, too.

Michael-F-Bryan commented 4 years ago

I think we can close the issue. The mdbook-linkcheck crate should fill this niche well enough.

ehuss commented 4 years ago

Sounds reasonable!