languagetool-org / languagetool

Style and Grammar Checker for 25+ Languages
https://languagetool.org
GNU Lesser General Public License v2.1
12.43k stars 1.4k forks source link

ignore spelling errors in domain names without "www." #1535

Open tiff opened 5 years ago

tiff commented 5 years ago

I noticed that LanguageTool finds spelling errors in domain names without www.

I think we should ignore them if they end with common tlds (com, org, gov, edu, net. info, eu, biz, com.mx, de, es, fr, nl, pt, co.uk, uk).

German: Bildschirmfoto 2019-04-13 um 23 08 43

English: Bildschirmfoto 2019-04-13 um 23 11 24

Spanish: Bildschirmfoto 2019-04-13 um 23 14 30

Portuguese: Bildschirmfoto 2019-04-13 um 23 16 02

French: Bildschirmfoto 2019-04-13 um 23 16 54

Polish: Bildschirmfoto 2019-04-13 um 23 18 48

Dutch and Russian are fine.

tiff commented 5 years ago

I think the same should apply to file names: Bildschirmfoto 2019-04-14 um 18 29 49

tiff commented 5 years ago

List of common file extensions: https://www.computerhope.com/issues/ch001789.htm

tiff commented 5 years ago

List of most common TLDs https://www.lifewire.com/most-common-tlds-internet-domain-extensions-817511

TiagoSantos81 commented 5 years ago

Hi @tiff

In my opinion this starts to enter in a tricky line in sand. While the extension in itself are already covered in some languages, and can be easily ported to others (code appended), what preceds it - i.e. the website name, or the filename - may sometimes benefit from being spellcheck. I understand the relation to emails and full URLs, but those use to be copy pasted into the texts, so they hardly contain errors. However, filenames with full filepaths, should be ignored for the same reasons emails are, so I will add this to my list of things to do, although it won't come anytime soon.


    <rule name="Ignore spelling of file names" id="IGNORE_SPELLING_OF_FILE_NAMES">
    <!-- Localized from English by Tiago F. Santos, 2018-09-15  -->
      <pattern>
          <token/>
          <token spacebefore="no">.</token>
          <token spacebefore="no" regexp="yes">&extensoes_de_ficheiros;</token><!-- For more extensions, refer to https://fileinfo.com -->
      </pattern>
      <disambig action="ignore_spelling"/>
    </rule>
    <rule name="Ignore spelling of @user mentions" id="IGNORE_USER_MENTION">
      <pattern>
          <token regexp="yes">@.+</token>
      </pattern>
      <disambig action="ignore_spelling"/>
    </rule>
  <rulegroup name="Ignore spelling of #hashtags" id="IGNORE_HASHTAG">
    <rule>
      <pattern>
          <token spacebefore="yes">#</token>
          <token spacebefore="no"/>
      </pattern>
      <disambig action="ignore_spelling"/>
    </rule>
MikeUnwalla commented 5 years ago

I fixed the problem in English for file names that contain underscores (https://github.com/languagetool-org/languagetool/commit/bb50b97c995eb02919842507920b7536ab37d4c4).