rust-lang / mdBook

Create book from markdown files. Like Gitbook but implemented in Rust
https://rust-lang.github.io/mdBook/
Mozilla Public License 2.0
17.32k stars 1.59k forks source link

Add a spellchecker #357

Open budziq opened 7 years ago

budziq commented 7 years ago

It would be nice to have a spellchecker (in cookbook we constantly fight with typos).

The rust book uses a shell script running aspell. We might consider using https://crates.io/crates/ispell as part of mdbook test (and possibly as a warning in mdbook build)

azerupi commented 7 years ago

One thing I want to really stress here is that I want to avoid an english-centric solution at all cost. I think it would be a shame to lock ourselves in with a solution for "the majority".

For the rest, I don't know much about spellchecking tools, so I'm open to everything. I also think this is a very desirable feature.

Things to think about:

prertik commented 6 years ago

I want to work on this. Any guidelines?

Michael-F-Bryan commented 6 years ago

Now that mdbook 0.1.0 is released we can get started on adding a spell-checker backend. I was thinking this would be a nice medium sized project for someone who's already familiar with Rust. It'll be quite similar to mdbook-linkcheck, so I imagine you can probably steal some ideas from there. Otherwise there's the alternate backends tutorial from the User Guide.

@prertik, are you still interested in working on this?

Rough Outline

My thoughts are you'd use something like the ispell crate on the contents of each chapter. Looking at what they do in The Book you'd have two modes, one purely checks using a generic dictionary and optionally a project-specific dictionary. There's also a "generate" mode for generating a list of what ispell thinks are errors so the user can go through and remove false positives.

Thanks to #541 users will be able to temporarily change the mode via an environment variable. So you could invoke MDBOOK_OUTPUT__ISPELL__MODE=generate mdbook build to do a once-off dictionary generation, or override things in CI without having to touch book.toml.

The project-specific dictionary should probably live next to book.toml. That way it gets version controlled and everyone automatically benefits.

You can use pulldown-cmark to scan through and only check the text sections, if so desired.

I would recommend having a look at the spellcheck.sh used for the second edition of the book. This gives you a rough outline of what's being done already.

To make sure this isn't english-centric, you'd want to provide a language = "en" option in the book.toml. This would let you select which dictionary to use, maybe letting you use an array (e.g. language = ["fr", "en", "ru"]) if your book contains multiple languages.

Configuration might look something like this:

[output.ispell]
language = "en"
mode = "check" 
project-dictionary = "words.txt"
prertik commented 6 years ago

I'd love to but I'm sorry @Michael-F-Bryan I currently have very tight schedule. Thank you for the awesome guideline.

frewsxcv commented 10 months ago

This is quite a complex feature to add, and it's frankly a lot easier to just install a spellchecker in your IDE or utilize an existing markdown spellcheck tool. Thoughts on closing?