Open yves-chevallier opened 4 years ago
That isn't supported today, but should be possible to implement.
The SpellingChecker
would need to support loading several dictionaries and only reporting an error if the token cannot be found in any of them. It would also have to track suggestions across all dictionaries and include them all.
The configuration options would need to support specifying multiple languages, as you suggest.
And it might be useful to have a directive to control the dictionaries in use for individual files, although that isn't strictly necessary.
Are you interested in contributing those changes?
I don't know how much I am interested. I am writing a quite long documentation (in french with some english) using sphinx and it is very important for me to have a CI roughly doing a check spell. However I didn't find any good package do to this and I am not really convinced by enchant which doesn't have any good tokenizer...
For example words such as Backus-Naur
should be written with a dash and supported in the dictionary as is. Currently I have two words in my dictionary: Backus
and Naur
because the tokenizer don't understand compound words. Also some words cannot be written with a capital letter such as C
keywords (while
, for
, return
). sphinxcontrib.spelling
should therefore support the text in the code-block
directives and it should support the language keywords by default. Another very annoying/important issue with the spelling is the way the user-dictionaries works. I would much prefer having a support for regex patterns. Such as for the verb eat
: [Ee]at(s?|en)|ate
or manger
in french [Mm]ange(s|ons|z|nt|ai[st])
...
It seems sphinxcontrib.spelling
is the best candidate for now, but not a good one for French :(
Yes, I suppose the quality of support for French terms depends on the underlying library for tokenizing and the dictionary for various conjugated forms of words.
It would probably be possible to support a tokenizer that recognizes technical terms like Backus-Naur
, but I haven't looked into that because I haven't needed it myself, yet.
Language-specific terms within code-blocks are interesting. Perhaps the tokenizer for the syntax highlighter could be reused for that.
I should also say that most of the code base for sphinxcontrib-spelling
doesn't care about which underlying spelling checker is used, so if there is a different library that works better for other languages we could make that pluggable (either based on the language or based on a new configuration option) and hide the differences in the SpellingChecker
class.
I vote for this feature. Now I use a workaround - merged dictionary (en+ru).
I have a document written in French and English. Is this possible to have something like: