Open jyn514 opened 4 years ago
Maybe we should also ping the author here @drahnr. He may have some ideas?
@pickfire @jyn514 glad to hear there is interest in using cargo-spellcheck
!
The error you see is very most likely caused by an older version which had issues parsing args properly, note that I plan to release v0.3.0
is planned for this WE / early next week which should resolve quite a few issues.
In the meantime you could give yesterdays alpha.6
release a shot which should have the traversal + arg parse issues resolved (at least works in the CI https://ci.spearow.io/teams/main/pipelines/cargo-spellcheck/jobs/master-validate/builds/117 ) note that the garbled output for multiline comments is the biggest remaining topic for multiline comments in this alpha which will be fixed in v0.3.0
If there are specific issues I would be very happy to get things working for you :)
It has a dependency on autotools and libtool which is unfortunate ... could we get them to avoid that somehow? -- @jyn514 https://github.com/rust-lang/rust/pull/74141#issuecomment-663111614
@drahnr What do you think?
@drahnr What do you think?
@pickfire As @euclio already stated in https://github.com/rust-lang/rust/pull/74141#issuecomment-663180481 - it should not be required anymore since v0.3.0-alpha.4
/ c041dca66b966610d216ec23aeb4cf386adcf54a which incorporates the latest hunspell-sys
which uses cc
without any C
-buildtools to create the static lib from hunspell source files, so that should be :ballot_box_with_check: if not please open a ticket :)
It doesn't work and probably displayed the wrong error.
Error: Failed to parse manifest file /home/ivan/src/pickfire/rs/rust/src/bootstrap/Cargo.toml: No such file or directory (os error 2)
@pickfire could you please open an issue with information on how to reproduce and commit sha / release so I can test / reproduce the issue?
v0.3.0-beta.1
should resolve this and a few more issues with the rust lang code base, let me know if this is sufficient for the time being or if anything particular feature is badly needed.
@drahnr I checked, a lot better but still not sufficient yet. Looks like UTF-8 handling still needs some tweak to make it work without panicking.
Found the root cause, v0.3.0-beta.5
should do the trick :tada:
I tried running it: it showed no output for ~three seconds then exited. With -vvvv I get the following output:
$ cargo spellcheck -vvvv
[2020-07-28T20:52:40Z WARN cargo_spellcheck] Loading configuration from /home/joshua/.config/cargo_spellcheck/config.toml, due to: No such file or directory (os error 2)
[2020-07-28T20:52:40Z DEBUG cargo_spellcheck::traverse] Running on inputs ["/home/joshua/src/rust"] / recursive=true
[2020-07-28T20:52:40Z DEBUG cargo_spellcheck::traverse] Processing /home/joshua/src/rust -> /home/joshua/src/rust
[2020-07-28T20:52:40Z DEBUG cargo_spellcheck::traverse] Running on absolute dirs ["/home/joshua/src/rust"]
[2020-07-28T20:52:40Z DEBUG cargo_spellcheck::traverse] Found a total of 1 files to check
[2020-07-28T20:52:40Z DEBUG cargo_spellcheck::traverse] Failed to complete /home/joshua/src/rust/src/bootstrap/Cargo.toml from filesystem
[2020-07-28T20:52:40Z DEBUG cargo_spellcheck::traverse] Failed to complete /home/joshua/src/rust/src/rustc/Cargo.toml from filesystem
[2020-07-28T20:52:40Z DEBUG cargo_spellcheck::traverse] Failed to complete /home/joshua/src/rust/src/librustc_codegen_llvm/Cargo.toml from filesystem
[2020-07-28T20:52:40Z DEBUG cargo_spellcheck::traverse] Failed to complete /home/joshua/src/rust/src/tools/cargotest/Cargo.toml from filesystem
[2020-07-28T20:52:40Z DEBUG cargo_spellcheck::traverse] Failed to complete /home/joshua/src/rust/src/tools/error_index_generator/Cargo.toml from filesystem
[2020-07-28T20:52:40Z DEBUG cargo_spellcheck::traverse] Failed to complete /home/joshua/src/rust/src/tools/linkchecker/Cargo.toml from filesystem
[2020-07-28T20:52:40Z DEBUG cargo_spellcheck::traverse] Failed to complete /home/joshua/src/rust/src/tools/rust-demangler/Cargo.toml from filesystem
[2020-07-28T20:52:40Z DEBUG cargo_spellcheck::traverse] Failed to complete /home/joshua/src/rust/src/tools/rustdoc/Cargo.toml from filesystem
[2020-07-28T20:52:40Z DEBUG cargo_spellcheck::traverse] Failed to complete /home/joshua/src/rust/src/tools/rls/Cargo.toml from filesystem
[2020-07-28T20:52:40Z DEBUG cargo_spellcheck::traverse] Failed to complete /home/joshua/src/rust/src/tools/rustdoc-themes/Cargo.toml from filesystem
[2020-07-28T20:52:41Z DEBUG cargo_spellcheck::checker] Running Hunspell checks
[2020-07-28T20:52:41Z DEBUG cargo_spellcheck::checker::hunspell] Dictionary search path /usr/share/myspell/ is not a directory
[2020-07-28T20:52:41Z DEBUG cargo_spellcheck::checker::hunspell] Dictionary search path /usr/share/hunspell/ is not a directory
[2020-07-28T20:52:41Z DEBUG cargo_spellcheck::checker::hunspell] Dictionary search path /usr/share/myspell/dicts/ is not a directory
It looks like it's skipping almost all the directories? And not running checks on the ones that it did find, because the /usr/share/* paths weren't found? I would expect both of those to be warnings, not DEBUG logging.
I installed hunspell
(with sudo apt install hunspell
) and now it appears to at least be doing something:
error: spellcheck(Hunspell)
--> /home/joshua/src/rust/src/tools/unicode-table-generator/src/skiplist.rs:7
|
7 | This will get packed into a single u32 before inserting into the data set.
| ^^^
|
|
| Possible spelling mistake found.
|
error: spellcheck(Hunspell)
--> /home/joshua/src/rust/src/tools/unicode-table-generator/src/skiplist.rs:14
|
14 | our largest sets are around ~1400 offsets long.
| ^^^^^
|
|
| Possible spelling mistake found.
|
However these warnings aren't particularly helpful - u32
is a builtin type and ~
is not a misspelling, it means approximation. @drahnr is there a way to add a whitelist of keywords or ignore symbols?
I tried running it: it showed no output for ~three seconds then exited. With -vvvv I get the following output:
$ cargo spellcheck -vvvv [2020-07-28T20:52:40Z WARN cargo_spellcheck] Loading configuration from /home/joshua/.config/cargo_spellcheck/config.toml, due to: No such file or directory (os error 2) [2020-07-28T20:52:40Z DEBUG cargo_spellcheck::traverse] Running on inputs ["/home/joshua/src/rust"] / recursive=true [2020-07-28T20:52:40Z DEBUG cargo_spellcheck::traverse] Processing /home/joshua/src/rust -> /home/joshua/src/rust [2020-07-28T20:52:40Z DEBUG cargo_spellcheck::traverse] Running on absolute dirs ["/home/joshua/src/rust"] [2020-07-28T20:52:40Z DEBUG cargo_spellcheck::traverse] Found a total of 1 files to check [2020-07-28T20:52:40Z DEBUG cargo_spellcheck::traverse] Failed to complete /home/joshua/src/rust/src/bootstrap/Cargo.toml from filesystem [2020-07-28T20:52:40Z DEBUG cargo_spellcheck::traverse] Failed to complete /home/joshua/src/rust/src/rustc/Cargo.toml from filesystem [2020-07-28T20:52:40Z DEBUG cargo_spellcheck::traverse] Failed to complete /home/joshua/src/rust/src/librustc_codegen_llvm/Cargo.toml from filesystem [2020-07-28T20:52:40Z DEBUG cargo_spellcheck::traverse] Failed to complete /home/joshua/src/rust/src/tools/cargotest/Cargo.toml from filesystem [2020-07-28T20:52:40Z DEBUG cargo_spellcheck::traverse] Failed to complete /home/joshua/src/rust/src/tools/error_index_generator/Cargo.toml from filesystem [2020-07-28T20:52:40Z DEBUG cargo_spellcheck::traverse] Failed to complete /home/joshua/src/rust/src/tools/linkchecker/Cargo.toml from filesystem [2020-07-28T20:52:40Z DEBUG cargo_spellcheck::traverse] Failed to complete /home/joshua/src/rust/src/tools/rust-demangler/Cargo.toml from filesystem [2020-07-28T20:52:40Z DEBUG cargo_spellcheck::traverse] Failed to complete /home/joshua/src/rust/src/tools/rustdoc/Cargo.toml from filesystem [2020-07-28T20:52:40Z DEBUG cargo_spellcheck::traverse] Failed to complete /home/joshua/src/rust/src/tools/rls/Cargo.toml from filesystem [2020-07-28T20:52:40Z DEBUG cargo_spellcheck::traverse] Failed to complete /home/joshua/src/rust/src/tools/rustdoc-themes/Cargo.toml from filesystem [2020-07-28T20:52:41Z DEBUG cargo_spellcheck::checker] Running Hunspell checks [2020-07-28T20:52:41Z DEBUG cargo_spellcheck::checker::hunspell] Dictionary search path /usr/share/myspell/ is not a directory [2020-07-28T20:52:41Z DEBUG cargo_spellcheck::checker::hunspell] Dictionary search path /usr/share/hunspell/ is not a directory [2020-07-28T20:52:41Z DEBUG cargo_spellcheck::checker::hunspell] Dictionary search path /usr/share/myspell/dicts/ is not a directory
It looks like it's skipping almost all the directories? And not running checks on the ones that it did find, because the /usr/share/* paths weren't found? I would expect both of those to be warnings, not DEBUG logging.
I am missing the exit code above and did it show anything besides the displayed print?
Unfortunately, a cargo manifest does not contain all products, i.e. src/lib.rs
can only be obtained by checking the filesystem. Now cargo_toml
is not entirely reliable (as in the definition of "error" for call of complete_from_path(..)
is not clearly defined in https://docs.rs/cargo_toml/0.8.0/cargo_toml/struct.Manifest.html#method.complete_from_path ) and hence the debug message, since it's not really a concern for now. This does not imply skipping, I guess re-wording is in order.
Adding a debug message and exit early seems to be reasonable if no dictionaries are configured / found, but that's not really what's happening here I think.
I installed
hunspell
(withsudo apt install hunspell
) and now it appears to at least be doing something:
Frankly I never tested it without the presence of any dictionary files, I hoped that the configuration section would be sufficient to avoid this case entierly, but I was wrong :) https://github.com/drahnr/cargo-spellcheck#configuration
error: spellcheck(Hunspell) --> /home/joshua/src/rust/src/tools/unicode-table-generator/src/skiplist.rs:7 | 7 | This will get packed into a single u32 before inserting into the data set. | ^^^ | | | Possible spelling mistake found. | error: spellcheck(Hunspell) --> /home/joshua/src/rust/src/tools/unicode-table-generator/src/skiplist.rs:14 | 14 | our largest sets are around ~1400 offsets long. | ^^^^^ | | | Possible spelling mistake found. |
However these warnings aren't particularly helpful -
u32
is a builtin type and~
is not a misspelling, it means approximation. @drahnr is there a way to add a whitelist of keywords or ignore symbols?
I would expect those to be contained within `
ticks, that would exclude them from being checked, at least for the rust types this should be the case? For pure numbers one can define a custom dictionary, see https://github.com/drahnr/cargo-spellcheck/issues/12 and the referring links for further hints on how to define a custom dictionary.
@jyn514 I think it would be understandable if we add all non-words like u32
with backticks. Or maybe the docs team should get pinged for this?
If this is not sufficient, I'd be open to define a regex set in the config for the hunspell
checker to skip those, but I'd rather avoid it if it can be solved with a custom dictionary.
@jyn514 I think it would be understandable if we add all non-words like
u32
with backticks. Or maybe the docs team should get pinged for this?
I'd be in favour of pinging the docs team.
@jyn514 I think it would be understandable if we add all non-words like
u32
with backticks. Or maybe the docs team should get pinged for this?
Not all of the warnings were for rust types, though. It was flagging various other things like
18 | the bitset to be worthwhile -- for those sets the biset is a 2x size win.
| ^^
| - ex, ix, ax, ox, bx, or Rx
31 | not skipping the small 'gap') is associated into words (u64) and
| ^^^^^
| - map's
43 | dynamically chosen for smallest size), and again deduplicate and store in an
| ^^^^^^^^^^^
| - reduplicate, duplicate, complicated, or delicate
65 | indexed in a separate dataset. That data set stores an index into the
| ^^^^^^^
| - data set, data-set, or database
None of those should be in backticks, and it seems silly to whitelist each individually.
Frankly I never tested it without the presence of any dictionary files, I hoped that the configuration section would be sufficient to avoid this case entierly, but I was wrong :) https://github.com/drahnr/cargo-spellcheck#configuration
Ok, I think that should be fixed before we go further so that new contributors don't have differences between their setup and CI (or at least, know that the differences exist).
I am missing the exit code above and did it show anything besides the displayed print?
The exit code was 0. It did not show any other output.
Unfortunately, a cargo manifest does not contain all products, i.e. src/lib.rs can only be obtained by checking the filesystem. Now cargo_toml is not entirely reliable (as in the definition of "error" for call of complete_from_path(..) is not clearly defined in https://docs.rs/cargo_toml/0.8.0/cargo_toml/struct.Manifest.html#method.complete_from_path ) and hence the debug message, since it's not really a concern for now. This does not imply skipping, I guess re-wording is in order.
I don't really understand this paragraph. What does this have to do with the message 'failed to complete'? Is the whole crate being skipped or just a single file?
I'd be in favour of pinging the docs team.
There isn't really a docs team: https://github.com/rust-lang/team/pull/298. But I can ping @rust-lang/docs anyway in case people still get notifications.
I don't really understand this paragraph. What does this have to do with the message 'failed to complete'? Is the whole crate being skipped or just a single file?
Nothing is skipped, it's just a bit odd behaviour that cargo_toml
tries to lookup src/lib.rs
and src/main.rs
which are not declared in the toml, but seems to error if src
doesn't exist. In the rust repo there are quite a few crates which do not have a source folder and just specify a [[bin]]
- so this would cause an error on the complete_from_path()
call, which would cause the previous debug message - which was just badly phrased, nothing bad was happening there at all :)
18 | the bitset to be worthwhile -- for those sets the biset is a 2x size win. | ^^ | - ex, ix, ax, ox, bx, or Rx 31 | not skipping the small 'gap') is associated into words (u64) and | ^^^^^ | - map's 43 | dynamically chosen for smallest size), and again deduplicate and store in an | ^^^^^^^^^^^ | - reduplicate, duplicate, complicated, or delicate 65 | indexed in a separate dataset. That data set stores an index into the | ^^^^^^^ | - data set, data-set, or database
None of those should be in backticks, and it seems silly to whitelist each individually.
Fully agree, but these are tricky - I think this would mandate some extra quirk code.
If there is no match found by hunspell, but suggestions are available, checking if one of the suggested replacements contains additional -
s but is otherwise equiv to the original word, the entry should be skipped.
Frankly I never tested it without the presence of any dictionary files, I hoped that the configuration section would be sufficient to avoid this case entierly, but I was wrong :) https://github.com/drahnr/cargo-spellcheck#configuration
Ok, I think that should be fixed before we go further so that new contributors don't have differences between their setup and CI (or at least, know that the differences exist).
Could you open a new issue with the relevant information so I can reproduce this within an OCI container?
v0.4.0-alpha.2
is the first pre-release that contains sufficient quirks to cover all the mentioned use-cases :tada: (it's a pre-release so, be aware of :bug: and :beetle: )
@drahnr a point of good news since I know this has mostly been feature requests so far: this caught a typo in rustfmt :)
error: spellcheck(Hunspell)
--> /home/joshua/rustc/src/tools/rustfmt/src/visitor.rs:84
|
84 | Both bounds are inclusifs.
| ^^^^^^^^^
| - inclusions or inclusion
|
| Possible spelling mistake found.
|
cargo-spellcheck
has reached v0.8.3
with a lot of improvements and more config options. It be great to get another round of feedback - with a 1.0.0
release planned for later this year.
@drahnr This sharing the crate to this week in rust and also put a call for partition in this week in rust to ask for feedback. Maybe you can also write a blog about it.
@drahnr This sharing the crate to this week in rust and also put a call for partition in this week in rust to ask for feedback. Maybe you can also write a blog about it.
@pickfire I'll probably only get around to writing a blog post next week at the earliest, but I think that's a good next step.
@rustbot label: +E-help-wanted
Just gave it a try on the libstd. As far as I can see, it's mostly missing words or missing backticks. The full output is here:
error: spellcheck(Hunspell)
--> rust/library/std/src/path.rs:2685
|
2685 | The iterator will yield instances of <code>[io::Result]<[fs::DirEntry]></code>. New
| ^^
| - oi, Io, oo, ii, ion, bio, Rio, or one of 7 others
|
| Possible spelling mistake found.
^ this looks like a bug, it doesn't recognize that code tags are the same as backticks.
error: spellcheck(Hunspell) --> rust/library/std/src/path.rs:2685 | 2685 | The iterator will yield instances of <code>[io::Result]<[fs::DirEntry]></code>. New | ^^ | - oi, Io, oo, ii, ion, bio, Rio, or one of 7 others | | Possible spelling mistake found.
^ this looks like a bug, it doesn't recognize that code tags are the same as backticks.
Could you file an issue, I wasn't aware this is legal syntax and so far nobody else reported an issue for this. I'll fix it ASAP.
Also I'm very late but I just saw
Unfortunately, a cargo manifest does not contain all products, i.e. src/lib.rs can only be obtained by checking the filesystem
You can get the full products with cargo metadata
I think; at least the entry points.
error: spellcheck(Hunspell)
--> rust/library/std/src/time.rs:134
|
134 | In practice such guarantees are – under rare circumstances – broken by hardware, virtualization
| ^
| Possible spelling mistake found.
This is a bug somewhere; not sure whether it's in libstd, spellcheck, or some interaction with @GuillaumeGomez's locale.
My locales:
$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=fr_FR.UTF-8
LC_TIME=fr_FR.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=fr_FR.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=fr_FR.UTF-8
LC_NAME=fr_FR.UTF-8
LC_ADDRESS=fr_FR.UTF-8
LC_TELEPHONE=fr_FR.UTF-8
LC_MEASUREMENT=fr_FR.UTF-8
LC_IDENTIFICATION=fr_FR.UTF-8
LC_ALL=
So I think the problem comes from cargo-spellcheck
(going through string characters with something else than .chars()
maybe?).
Also to be noted: cargo-spellcheck
is very slow. It took me a few minutes to get this output.
There is no locale interaction, it assumes UTF-8 for files and assumes en_US locale by default, which is used exclusively to determine the affix and dictionary files for hunspell. The prints are all UTF-8 as well, so that's the only side I think could mess things up. cargo-spellcheck
uses chars()
for sub ranges.
Speed is always improvable, the biggest boost would be to store the constructed tokenizer info, there was an issue https://github.com/drahnr/cargo-spellcheck/issues/57 but the majority of this is marked as done now, since the remaining potential is mostly in the dependenciesnlprule
and hunspell
.
Also note that one only sees output for the issues, not for anything that passed without errors which still requires a significant amount of CPU. If your CPU is idle, that's a different topic.
Please file an issue, if you think there could be an issue or it hampers usability. If something doesn't hit the issue tracker it doesn't exist for me.
So far it has worked ok for small to large code bases.
error: spellcheck(Hunspell) --> rust/library/std/src/path.rs:2685 | 2685 | The iterator will yield instances of <code>[io::Result]<[fs::DirEntry]></code>. New | ^^ | - oi, Io, oo, ii, ion, bio, Rio, or one of 7 others | | Possible spelling mistake found.
^ this looks like a bug, it doesn't recognize that code tags are the same as backticks.
I implemented a workaround and released v0.11.2 . I came across another issue, backlogged as https://github.com/drahnr/cargo-spellcheck/issues/263
From #74141:
It would be great to have a tool that automated spell-checking for the docs. @pickfire suggested
cargo spellcheck
which had some trouble with bootstrap. Maybe we could somehow modify it to work with bootstrap so it runs as part oftidy
?