languagetool-org / languagetool

Style and Grammar Checker for 25+ Languages
https://languagetool.org
GNU Lesser General Public License v2.1
12.43k stars 1.4k forks source link

[en] false alarms EN_UPPER_CASE_NGRAM #2947

Open tiff opened 4 years ago

tiff commented 4 years ago

Applying the suggestions would cause an inconsistent capitalization:

When is a Book not a Book?

Bildschirmfoto 2020-05-19 um 07 20 02

These are the Top 10% Lunch Deals.

Bildschirmfoto 2020-05-19 um 07 20 32

Participation Points: 5

Bildschirmfoto 2020-05-19 um 07 25 11

Some more:

**Inquiry/Issue: Hello, I have a problem.

It Doesn't Work.

You can order by going to Account -> Order.

Yes (Good Afternoon/Evening).

@danielnaber it would be nice if you could spend some more time on this to make this rule create less false positives.

tiff commented 4 years ago

Here's another example where the suggestion only wants to change one word but leaves the other words as they are: Bildschirmfoto 2020-05-19 um 07 29 27

tiff commented 4 years ago

Another case, where the suggestion creates an inconsistent casing: "clean & Clear"

Bildschirmfoto 2020-05-19 um 16 00 32
danielnaber commented 4 years ago

Another case, where the suggestion creates an inconsistent casing: "clean & Clear"

Can you post the original text? I can't reproduce this case.

tiff commented 4 years ago

Johnson & Johnson is headquartered in New Brunswick, New Jersey, the consumer division being located in Skillman, New Jersey. The corporation includes some 250 subsidiary companies with operations in 60 countries and products sold in over 175 countries. Johnson & Johnson had worldwide sales of $70.1 billion during calendar year 2015.[4] Johnson & Johnson's brands include numerous household names of medications and first aid supplies. Among its well-known consumer products are the Band-Aid Brand line of bandages, Tylenol medications, Johnson's Baby products, Neutrogena skin and beauty products, Clean & Clear facial wash and Acuvue contact lenses.

tiff commented 4 years ago

A few more examples, where IMHO we should not suggest the lowercase word, because it's either a headline or a very short informal phrase/exlamation.

Happy Camping!

What Happened?

Look and Feel:

Figure/Ground:

Snap or Tap?

tiff commented 4 years ago

They have 100,418 Cases, 4,525 Deaths.

Bildschirmfoto 2020-05-22 um 14 50 39

tiff commented 4 years ago

System Granted $5.00

This currently causes the rule to trigger. Probably because there's a dot in there which is seen as the sentence punctuation? I think we should only look at full sentences (that have the punctuation at the end, not somewhere in the middle)

danielnaber commented 4 years ago

Sentence detection seems fine in this case. System Granted $5.00. (with . at the end) also has the same false alarm. Maybe count the "real" tokens (word tokens) and ignore short sentences?

tiff commented 4 years ago

@danielnaber but it doesn't happen with System Granted $5, that's why I assumed it's caused by the dot

tiff commented 4 years ago

This is the code: https://github.com/languagetool-org/languagetool/blob/7eb3807ee8909d4f470dc33857ff9d1f3e311ccc/languagetool-language-modules/en/src/main/java/org/languagetool/rules/en/UpperCaseNgramRule.java#L429-L441

Looks to me as if it's just checking if there's a dot somewhere.