sublimehq / sublime_text

Issue tracker for Sublime Text
https://www.sublimetext.com
812 stars 39 forks source link

Don't spell check file extensions or other .dotnames #5097

Open eugenesvk opened 2 years ago

eugenesvk commented 2 years ago

Problem description

Currently some non-words, e.g., file extensions starting with a period without a space are spell checked and are marked in red, so this config dictionary is all mostly red

"binary_file_patterns":[ "*.3gp","*.3gpp","*.7z","*.7zip","*.aac","*.accda","*.accdb","*.accde","*.accdr","*.accdt","*.adn","*.aep","*.aet","*.ai","*.aiff","*.au","*.bin","*.dbf","*.dds","*.dll","*.doc","*.docm","*.docx","*.docxml","*.dotm","*.dotx","*.eot","*.exe","*.flac","*.gif","*.gz","*.h264","*.ico","*.idml","*.indb","*.indd","*.indl","*.indt","*.inx","*.jar","*.jpeg","*.jpg","*.laccdb","*.lnk","*.m4a","*.m4p","*.m4v","*.maf","*.mam","*.maq","*.mar","*.mat","*.mdb","*.mdw","*.mkv","*.mov","*.mp+","*.mp3","*.mp4","*.mpc","*.mpeg","*.mpg","*.mpp","*.odg","*.ods","*.oga","*.ogg","*.ogm","*.ogv","*.one","*.opus","*.otf","*.pdf","*.png","*.pps","*.ppsx","*.ppt","*.pptx","*.prel","*.prproj","*.psb","*.psd","*.psq","*.pzip","*.ra","*.rar","*.rc","*.rm","*.saz","*.sln","*.swf","*.tar","*.tga","*.tgz","*.ttf","*.vcxproj","*.wav","*.webm","*.wim","*.wma","*.wmf","*.wmv","*.woff","*.woff2","*.wri","*.xls","*.xlsb","*.xlsm","*.xlsx","*.xlt","*.zip"]

Preferred solution

I'd prefer to be able to set additional exclusion rules for spell checking, e.g., ignore words starting with a period in a list of contexts X, Y (as I'm certain in some other contexts those word.word would mostly be real words, not file extensions) just like there are currently contexts for globaly enabling a spell check

Alternatives

Adding all the extensions in an exclusion list, but there are many extensions :) and these are not always extensions, so a general rule is preferable to a word list

Additional Information

No response

BenjaminSchaaf commented 2 years ago

There's already "spelling_selector" for determining where spell checking is done. I don't think it makes sense to have "after a dot" be any different to any other situation as there's plenty of cases where text after a dot should be spell checked.

eugenesvk commented 2 years ago

There's already "spelling_selector" for

and I mentioned it, and it's useless for the issue I'm describing because the text scope is the same (e.g. a plain text/markdown file/code comment)

as there's plenty of cases where text after a dot should be spell checked.

Unless there aren't? The comparative plentifullness is completely use(r)-dependent and the one-size-fit-all judgment doesn't fit. For most of my text use cases I'd rarely need it to spell check the .dotwords, while there are plenty of examples of false positives. And at any rate, whenever I do need, I'd enable the scope limit to the no-dot rule And if the scope doesn't change for the .dotwords in the context where I need it, even then I'd still be better off — I can always disable the nodot rule if I see that it hurts me more than it helps!

At the moment I often have to just disable real-time spell checking because of too many false positives brightening my files and the annoying false positives on almost every single word! as mentioned here https://github.com/sublimehq/sublime_text/issues/4070 and then enable it from time to time

BenjaminSchaaf commented 2 years ago

For most of my text use cases I'd rarely need it to spell check the .dotwords, while there are plenty of examples of false positives. And at any rate, whenever I do need, I'd enable the scope limit to the no-dot rule And if the scope doesn't change for the .dotwords in the context where I need it, even then I'd still be better off — I can always disable the nodot rule if I see that it hurts me more than it helps!

And for many other use-cases there's all sorts of situations where spell check is not desired, having a special-case for each is not actionable. A more generic solution using regexes that can provide contextual information would be required, which is exactly what syntax definitions and scopes are. If you've got a specific use case where \.\w+ should not be spell checked I'd suggest making a syntax definition to scope those such that they can be ignored by the "spelling_selector" setting.

eugenesvk commented 2 years ago

If you've got a specific use case where \.\w+ should not be spell checked I'd suggest making a syntax definition to scope those such that they can be ignored by the "spelling_selector" setting.

Yes, good idea, my specific use case is to exclude, e.g., (\.\w{1,4})\b matches everywhere as I can't imagine ever needing to spellcheck such short dotwords, though I'd most likely start with your broader example as well and see if that brings any trouble This might even be helpful in partially dealing with the other issue mentioned above as it could be used to ignore words at the end of a line, or words without a space/punctuation mark at the end, so very helpful for a lot of newly typed in text

How do I create a global syntax definition like that that would be prepended to every single syntax definition out there without messing anything else?

eugenesvk commented 2 years ago

FYI here is an example of the proper approach to spell check filters that would match the plentifulness you mentioned, is this kind of global spell check override something that I can replicate with the syntax definitions you suggested?

https://github.com/bartosz-antosik/vscode-spellright/blob/master/README.md

"spellright.ignoreRegExps": []

Regular expressions ignored in spelling. Allows to ignore/consider as spelled correctly generalized expressions. Works on raw document before separating words to spell which allows to ignore larger parts of the document. Regular expressions have to be in double quoted JavaScript regular expression format. That is backslash has to be quoted as well e.g.: "/(\\.?)(gif|png)/g" to ignore file extensions like ".gif" and ".png".

"spellright.ignoreRegExpsByClass": {}

Extends setting of "spellright.ignoreRegExps" per document type. Accepts object of key-multi-value pairs. For example following settings:

"spellright.ignoreRegExpsByClass": {
    "markdown": [ "/&/g", "/ /g" ],
    "cpp": [ "/#include\\s+\\\".+\\\"/g" ],
    "html": [ "/<script>[^]*?</script>/gm" ],
    "latex": [ "/\\\\begin{minted}[^]*?\\\\end{minted}/gm" ]
}
  • avoid spelling of &amp; and &nbsp; literals in markdown documents;
  • avoid spelling of strings in #include "file" construct in CPP documents;
  • avoid spelling of multiline <script></script> tag content in HTML documents;
  • avoid spelling of "minted" code blocks in LaTeX documents.
BenjaminSchaaf commented 1 month ago

Yes, you can achieve that using syntax-specific settings. For instance that markdown one would be - constant.character.entity.named.html as the selector.

eugenesvk commented 1 week ago

You're using a predefined selector, not a custom this.is.my.custom.file.extension, and I can't add this custom selector without manually overriding every single syntax one by one (and then track changes)

BenjaminSchaaf commented 3 days ago

Right, there doesn't seem to be any solution that works well enough. I'll reopen the issue, but note this is unlikely to be worked on any time soon.