Open jayaddison opened 1 year ago
I like this one, we already had the issue multiple time with behavior, color, etc.
Thanks @Pierre-Sassoulas - yep, it would be a nice (low priority) improvement. I've been experimenting with implementing it locally, and made some progress, but have found some odd argparse
/ optparse
behaviour and am not certain whether I'll contribute an implementation soon.
have found some odd argparse / optparse behaviour
It would be easier if pylint's input needs to exactly match pyenchant input and is passed to it directly (so if pyenchant handle multidict, then giving a string with multi dict works). Maybe we need to remove our own argparsing, parse spelling args with pyenchant on a dummy text during argparsing and before launching our checks, to be able to display the errors raised by pyenchant.
It seems that although pyenchant
has support for multiple language dictionaries as datasources, it can only spellcheck using one-at-a-time.
Also, the API that pyenchant
provides includes tokenization -- which could be relevant because we would need to decide how to update the existing pylint
spellcheck tokenization.
Some other Python spellcheck libraries like symspellpy
support construction of multi-language spellchecker objects -- but the dictionary file(s) used would be different, so a change in library could result in spellcheck differences for pylint
users.
It would be easier if pylint's input needs to exactly match pyenchant input and is passed to it directly (so if pyenchant handle multidict, then giving a string with multi dict works). Maybe we need to remove our own argparsing, parse spelling args with pyenchant on a dummy text during argparsing and before launching our checks, to be able to display the errors raised by pyenchant.
@Pierre-Sassoulas What would be the purpose of the dummy text?
Not sure what I meant by "dummy text" at the time but looking at the problem again I suppose one way to rephrase it would be "text without signification that we would not try to interpret during configuration parsing", i.e. pylint just being a middleware between user config and pyenchant.
If I understand you correctly, that could mean decoupling the choice validation from enchant
known-dictionary list?
That could make sense. It seems unclear whether the existing spelling-dict
option (and choice validation) are a good place for this.
A few things I've learned from prototyping an implementation:
Also:
pylintrc
) config files tend to use csv
option type for multi-valued items.csv
option type doesn't easily interoperate with choices
.I'm becoming less convinced that this is a worthwhile feature to implement. A single locale to spellcheck during each pylint
run is simple implementation-wise, configuration-wise and also probably easiest to understand.
It might make more sense for users who would like to accept alternate spellings to consider using the existing custom-word-list feature, or to merge dictionary source files locally. Alternatively it may be possible to use the multi-dictionary support in aspell
(and perhaps other spellcheck providers) from enchant
-- I haven't confirmed that, though.
Suggestions (corrections offered from the dictionary) from multiple languages could be confusing.
Unless the dict are en_us
and en_gb
, or fr
, and fr_be
or similar which I think are the only valid reasons to use multiple dicts
If I understand you correctly, that could mean decoupling the choice validation from enchant known-dictionary list?
If we want to be fancy about validations and if I understand correctly what pyenchant is doing, we'd need the ','.join(itertools product)
of what we have now without the empty string, + what we have now. (i.e: if we had ["", "en_us", "en_gb", "fr"]
we now need ','.join(itertools.product(v for f in ["", "en_us", "en_gb", "fr"] if v) + ["", "en_us", "en_gb", "fr"]
... But it only work for combination of 2 dicts not more ! (en_us, en_gb
, not en_us, en_gb, fr
). If we want to provide all the possible combinations we need to redo this n time with n the number of dict available in pyenchant. This feel overly complicated, but it's possible. It's probably better to not provide a "choices" field in argparse and just validate that each element of a csv list of pyenchant are all valids installed pyenchant dict. (Or not provide choices, not validate either, and catch pyenchant errors if/when they happen, which is what I'm suggesting 😄). This would also fix the annoying update of the doc depending on installed pyenchant dict locally. (https://github.com/pylint-dev/pylint/pull/8638#discussion_r1181248636)
Compatibility of the configuration file formats is important for migration purposes
Sure but we did not allow multiple values before, so the current valid configurations all contains valid comma separated list of exactly one value, right ?
It's probably better to not provide a "choices" field in argparse and just validate that each element of a csv list of pyenchant are all valids installed pyenchant dict. (Or not provide choices, not validate either, and catch pyenchant errors if/when they happen, which is what I'm suggesting :smile:). This would also fix the annoying update of the doc depending on installed pyenchant dict locally. (https://github.com/pylint-dev/pylint/pull/8638#discussion_r1181248636)
Compatibility of the configuration file formats is important for migration purposes
Sure but we did not allow multiple values before, so the current valid configurations all contains valid comma separated list of exactly one value, right ?
Yep, OK, that all makes sense, and using unvalidated CSV options would be a straightforward, backwards-compatible approach.
The self-documenting behaviour of the dictionary choices
seems potentially useful in some cases (except when causing unexpected source control changes!). Perhaps it's more valuable at the command-line than in the autogenerated documentation, though.
I have a work-in-progress branch that attempts to maintain the choices
validation while providing multi-locale-within-one-language (en-US, en-CA
ok, fr-FR, fr-CA
ok; xx-AA, yy-AA
not ok). I could open that as a draft pull request if that'd be worthwhile?
I have a work-in-progress branch that attempts to maintain the choices validation while providing multi-locale-within-one-language (en-US, en-CA ok, fr-FR, fr-CA ok; xx-AA, yy-AA not ok). I could open that as a draft pull request if that'd be worthwhile?
Sure, that would be worthwhile, thank you for working on this, much appreciated ! I think if someone want to mix fr/en or es/en we shouldn't prevent them from doing it. But we should prevent pyenchant crashes that would also make pylint crash if the input given to pyenchant is invalid. Do not hesitate to open a draft merge request so we can discuss the code directly :)
Current problem
Recently, spellchecking the word
onwards
using anen_US
failed during apylint.git
continuous integration workflow; it would likely succeed with mosten_GB
dictionaries.The problem is more general, though: codebases may have contributors who read and write using a variety of locales.
Desired solution
The
spelling-dict
option could be extended to accept multiple locale options to spellcheck against.Additional context
pyenchant
supports multi-language spellchecking.