suborbital / docs

Documentation monorepo for Suborbital projects and products
https://docs.suborbital.dev
Apache License 2.0
8 stars 5 forks source link

Create spellcheck.yml #115

Closed LauraLangdon closed 2 years ago

LauraLangdon commented 2 years ago

Add a spell check to docs workflow

LauraLangdon commented 2 years ago

I need to add more words to spelling.dic; it's flagging all kinds of things we need.

hola-soy-milk commented 2 years ago

Looking amazing!

flaki commented 2 years ago

A couple thoughts on this based on the flagged (false) positives:

The docs are not plain Markdown, but MDX, that is markdown+JSX (so practically as much HTML as they are MD):

Misspelled words:
<markdown> website/docs/atmo/runnable-api/graphql-client.md
--------------------------------------------------------------------------------
…
MultiLanguageCodeBlock
…
TabItem
…
groupId
href
…

We would definitely need to filter these or build a custom dictionary that takes care of this.

On the other hand, the spellcheck seems to indiscriminately crawl and flag all code blocks:

Misspelled words:
<markdown> website/docs/grav/usage/getting-started/request-reply.md
--------------------------------------------------------------------------------
Grav
MsgReceipt
MsgTypeDefault
NewMsg
OnReply
Println
RPC
…
pre
reciepts
requestReply

On the one hand, this will be improved in the future when we move a good chunk of (but not necessarily all) code snippets to files outside of the markdowns, but in certain cases it would make a lot of sense to check these files, such as for the runnable APIs, but based on a generated wordlist:

Misspelled words:
<markdown> website/docs/atmo/runnable-api/logging.md
--------------------------------------------------------------------------------
…
LogDebug
LogErr
LogInfo
LogWarn
…
logDebug
logErr
logInfo
logWarn
…

Similarly for other things like client libraries. This would of course be a "blunt weapon", and a smarter way to do this would be through proper tests (á la #81).

LauraLangdon commented 2 years ago

A couple thoughts on this based on the flagged (false) positives:

The docs are not plain Markdown, but MDX, that is markdown+JSX (so practically as much HTML as they are MD):

Misspelled words:
<markdown> website/docs/atmo/runnable-api/graphql-client.md
--------------------------------------------------------------------------------
…
MultiLanguageCodeBlock
…
TabItem
…
groupId
href
…

We would definitely need to filter these or build a custom dictionary that takes care of this.

Yep! I'm slowly working my way through all the false positives in spelling.dic.

On the other hand, the spellcheck seems to indiscriminately crawl and flag all code blocks:

Misspelled words:
<markdown> website/docs/grav/usage/getting-started/request-reply.md
--------------------------------------------------------------------------------
Grav
MsgReceipt
MsgTypeDefault
NewMsg
OnReply
Println
RPC
…
pre
reciepts
requestReply

On the one hand, this will be improved in the future when we move a good chunk of (but not necessarily all) code snippets to files outside of the markdowns, but in certain cases it would make a lot of sense to check these files, such as for the runnable APIs, but based on a generated wordlist:

Misspelled words:
<markdown> website/docs/atmo/runnable-api/logging.md
--------------------------------------------------------------------------------
…
LogDebug
LogErr
LogInfo
LogWarn
…
logDebug
logErr
logInfo
logWarn
…

Similarly for other things like client libraries. This would of course be a "blunt weapon", and a smarter way to do this would be through proper tests (á la #81).

From today's docs meeting notes:

Flaki: it looks like the spellchecker used https://facelessuser.github.io/pyspelling/filters/markdown/ for a markdown filter, I don’t see an option for this out of the box (nor for JSX/MDX) so this might not be possible (without modifying that filter manually). cc @arbourd