FR: Auto-correct should ignore words in all caps (aka acronyms)

jjspace commented 2 months ago

Is Your Feature Request Related to a Problem? Please Describe.

I was writing up some notes about insurance benefits and wrote HSA which is a perfectly valid acronym but the rule changed this to HAS. This is not a good correction. I had to add hsa as a word to ignore but in normal writing I would want that to get replaced with has.

Describe the Solution You'd Like

I think that words that are only uppercase should be ignored. Or at least there should be an option to toggle this. I think larger scale the autocorrect should be case sensitive in general but that seems less likely to be added.

Describe Alternatives You've Considered

I've used the Auto Correct VSCode extension for a long time and have always enjoyed how it handled things and wish this rule worked like that does.

pjkaufman commented 2 months ago

Hey @jjspace . Thanks for the feedback. I am open to adding a setting that would say that if a word is entirely uppercase, then it would be ignored by autocorrect. That seems to make sense to me. The logic that the Linter used as the base for the rule did not have exceptions like this from my recollection which is why the Linter did not have this before.

I also see that you have some other features that you may be hoping for based on

I've used the Auto Correct VSCode extension for a long time and have always enjoyed how it handled things and wish this rule worked like that does. Could you elaborate on what functionality you wished was more similar?

pjkaufman commented 2 months ago

Also, as for the acronym ignore, do you think https://github.com/platers/obsidian-linter/issues/1075 covers the idea of what an acronym is?

jjspace commented 2 months ago

I am open to adding a setting that would say that if a word is entirely uppercase, then it would be ignored by autocorrect

I personally think that's probably one of the best options short term.

Could you elaborate on what functionality you wished was more similar?

Also, as for the acronym ignore, do you think https://github.com/platers/obsidian-linter/issues/1075 covers the idea of what an acronym is?

The more I think about this the more I think that only comparing strings after converting to lowercase is a mistake in general. It may help specifically in the case where the word is the start of a sentence or used as a title/proper noun but I think it gets in the way in many other cases. I do think that other issue captures what an acronym is including ones like ToS which I didn't think of including upper and lower case letters. I think that further highlights why this whole rule should be case sensitive. If this causes an issue with the default word list you could just include all those replacements as well as all those replacements with the first letter capitalized. That would cover all normal cases and all cases where the word is used at the beginning of a sentence or title but would not cover all the cases of mixed case or all caps, like in acronyms. For my example above that would mean the default list has replace ['hsa', 'has'] and ['Hsa', 'Has'] which would not trigger for 'HSA'

pjkaufman commented 2 months ago

Could you elaborate on what functionality you wished was more similar? Also, as for the acronym ignore, do you think #1075 covers the idea of what an acronym is?

The more I think about this the more I think that only comparing strings after converting to lowercase is a mistake in general. It may help specifically in the case where the word is the start of a sentence or used as a title/proper noun but I think it gets in the way in many other cases. I do think that other issue captures what an acronym is including ones like ToS which I didn't think of including upper and lower case letters. I think that further highlights why this whole rule should be case sensitive. If this causes an issue with the default word list you could just include all those replacements as well as all those replacements with the first letter capitalized. That would cover all normal cases and all cases where the word is used at the beginning of a sentence or title but would not cover all the cases of mixed case or all caps, like in acronyms. For my example above that would mean the default list has replace ['hsa', 'has'] and ['Hsa', 'Has'] which would not trigger for 'HSA'

I am not necessarily opposed to changing the algorithm. However, duplicating the default list is out of the question. Duplicating the default list would almost double the size of the Linter again (the default list is almost 1 MB in size as it stands). If the list were downloaded separately or only downloaded when the auto-correct rule was enabled for the first time, then I would be fine with duplicating the list and converting it into a different format that would be have one set of words for just lowercase and one for just an uppercase first letter.

The main issue that I do not think I follow is to what extent the change is beneficial. It sounds like it just benefits the Acronyms use case. Is that correct? Or are there other use cases? If it is just for acronyms, we can try to better address that scenario, but I would want to understand the problem as a whole better before moving forward.

If the algorithm were changed to be case sensitive, it would be a breaking change and would necessitate a major version change. I am fine with this, but thought I would note that just in case that has any bearing.

pjkaufman commented 1 month ago

I am working on adding an option that will allow you to make acronyms/capitalizations other than the first letter of the word result in ignoring the word for custom auto-correct. I will hopefully have something for the next release.

pjkaufman commented 1 month ago

I believe I have a fix for the acronyms issue. There will be no way to auto-correct acronyms as it stands, but this can be changed in the future if this is a feature desired by the community. Again, feel free to recommend changes to auto-correct that you believe would improve it with use cases provided. Thanks!

pjkaufman commented 1 month ago

I have merged the PR to add ignoring acronyms/words with multiple capital letters in them. It should be on master and in the next release. Please let us know if either has an issue.

platers / obsidian-linter