Open andrewrk opened 1 month ago
https://github.com/codespell-project/codespell looks like a popular and actively maintained option and has a -w
flag to write the changes
stefanzweifel/git-auto-commit-action@v5
is a handy github action to make commits after making changes in CI
another existing tool https://github.com/crate-ci/typos. i've used it on a personal project as a ci step, and additionally in my editor at all times. the only false positives i had was E.ACCES
and strat
(configurable).
is your vision for this to be a spell-checker written in Zig so that it can be compiled and run without additional dependencies? if so, maybe porting one of these options would be reasonable start
bad ideas:
:x: dependency on github
:x: inability to test locally with zig build
Are we looking to check the spelling of decls? That seems a lot harder, but just curious. To add to that, are we just looking to check .zig files, or all?
What about pinning this issue (so new contributors that want to report spelling errors see this issue)?
I used the suggestion of @paperdave above and tried out using the codespell tool on the zig source tree just to do a preliminary check on what things look like. I ended up with the following rough code. I just extract all of the comments in all of the *.zig files in the repo and then pass them through the codespell tool, then print out a usable report (the exact specifics here are not that important but I'm including this just for reproducibility):
After using this I ended up with ~150 "typos" which I pared down to 116 by expanding the ignore list to include a generated file and adding some words to the allow list. There will (and should) probably be some quibbling over a couple of the included "typos", but the vast majority are actual mistakes and not just preference.
Here is the output I got just for reference:
Thats all using the following allow list:
flate, dependee, dependees, re-use, runned, reenable, ECT, strat, re-declare, re-use, HSA, Synopsys, AKS, Numer, crate, Arithmetics, fo, AtLeast
This might be a good starting place for anybody making a more refined tool.
It would be possible (probably not all that hard) to rewrite the core of the codespell tool in a simplier manner, as it has quite a few knobs and features that I don't think are necessary. Ultimately, spell checking will always require a human at wheel to make sure puns, jargon, and the like are not caught, but it can be done with minimal intervention.
Just as a footnote to an already-big-enough comment: here is a discussion of spellchecking comments in the Linux kernel that I think might be of interest to anyone reading this thread: http://www.kegel.com/kerspell/
Great results! Would you mind placing the larger blocks into collapsable regions, please?
Great results! Would you mind placing the larger blocks into collapsable regions, please?
Fixed (I thought it would do that automatically, oops)
We get a lot of spelling error corrections, especially from people looking to do their first contribution to the project.
This wastes computer and human resources because it takes a lot less time to solve this in bulk. I don't want to merge 1,000 PRs each fixing one word even if those 1,000 PRs are spread among many years.
However we cannot ignore this issue when the spelling errors are in doc comments because it can cause problems with grep and searching in autodocs.
This issue is to implement some kind of automated spelling checker that has no false positives, yet catches everything that would otherwise be caught by a contributor, and add it to our CI so that we can stop this problem once and for all.
In the meantime please pick something substantial to work on. If you want to fix spelling errors then fix all of them, including the ones from the future.
P.S. Please don't suggest some "skip CI" mechanism.