wooorm / nspell

📝 Hunspell compatible spell-checker
MIT License
267 stars 20 forks source link

Question: how to overwrite nspell.personal dictionary? #24

Closed GrayedFox closed 4 years ago

GrayedFox commented 4 years ago

Hello there Titus Wormer, first of all a big thanks! Thanks to your native JavaScript Hunspell wrapper I have been able to develop a smart language detecting spell checker plugin without doing any of the NLP heavy lifting. Awesome, thank you, keep up the great work!

Now for the fun part :smile_cat:: I'm using nspell.personal() to add a list of user added words from browser storage.

Here are the offending lines from my user class:

  // add word to user.ownWords and update existing spell checker instances
  addWord (word) {
    this.ownWords.push(word)
    Object.keys(this.spellers).forEach((language) => {
      this.spellers[language].personal(this.ownWords.join('\n'))
    })
  }

  // remove word from user.ownWords and update existing spell checker instances
  removeWord (word) {
    this.ownWords.remove(word)
    Object.keys(this.spellers).forEach((language) => {
      this.spellers[language].personal(this.ownWords.join('\n'))
      // assuming personal() will here overwrite existing dictionary given lack of add/remove methods
    })
  }

ownWords is just an array of strings with no affix rules or slashes for modelling - just words users think are words (strings cleaned of numeric and nonword characters). I would like to know how to overwrite the existing personal dictionary, or alternatively, how to remove a word from a personal dictionary once it has been added. nspell.personal.remove() is undefined (as expected and per the docs), from what I understand of the personal file, is the nspell.personal() method actually just a wrapper around nspell.add()?

Referring to this: https://github.com/wooorm/nspell/blob/c92902162193084027a5a8b7259702af90804528/lib/personal.js#L41

To provide the final bit of context: I was originally just adding all words from user.ownWords to each nspell instance with nspell.add(). The problem is if a user adds the word "dark" and has British English, American English, and German spellers enabled, that word is added to all of those dictionaries. When the user removes the word, it is also removed from the British and English dictionaries and now "dark" is marked as incorrect in those languages. Enter nspell.personal(). Since I don't persist changes to core dictionaries, the bug thankfully goes away after restarting the plugin/browser, but I'm digging deep on this one!

One of the features of the extension is that users can add a word in one language to have it marked as correct in all the languages they use, and I'd love to keep it that way.

GrayedFox commented 4 years ago

Note: I extend the native array prototype with a custom remove() method (if Array.prototype.remove is undefined) just in case you spot that and rightly point out there is normally no such method.

wooorm commented 4 years ago

Hi there Che! Thanks for the kinds words!

I think ownWords should be several lists: one per language. Because a word I add to Dutch, shouldn’t be a word in English.

If you’re not dealign with true “personal” dictionaries, I’d say use .add (and .remove) directly! Could that work for your case

GrayedFox commented 4 years ago

My pleasure :)

Hmmm you make a fair point that a custom word should perhaps keep it's language as context. From a slightly more user as opposed to linguistic perspective: I imagine it won't necessarily bother someone to see that after they have added "shizzle" to their custom word list, shizzle is now marked correctly in Dutch and English and any other language they employ, if the point of adding the word is just to make sure it never gets marked as incorrect. Admittedly I'm assuming a lot on the users behalf here :shrimp:

At the moment, using .add and .remove (which I was doing before) is a bit tricky since it results in the same bug, and I was hoping .personal might resolve the issue somehow.

Lets say someone is using English and Dutch spellers, the bug reproduction steps would be the following:

  1. User uses Dutch dictionary and English dictionaries
  2. User adds "geluk" (calling npsell.add) to own words, which adds it to Dutch dictionary (unnecessary since it's already correct in Dutch, as far as I know) and English dictionary (necessary to see geluk marked correct in English)
  3. User removes "geluk" (calling nspell.remove) from own words, which also removes it from all in use dictionaries. Dutch now marks geluk as incorrect.

I was under the impression that the users personal dictionary was sort of there to allow users to populate a personal dictionary of custom words without touching the core language dictionaries. By using nspell.personal, I can nearly get the desired effect: the code detects the content language and matches that with an nspell instance, then we call nspell.correct which also checks any words added via nspell.personal. Problematically, there doesn't seem to be any way to remove words from the personal list once they are added - unless I'm missing something!

GrayedFox commented 4 years ago

I think I can work around this behaviour by changing the custom word list to be an object of arrays that looks like this (and checking which in use languages the word is correct in when adding it to the users list of custom words):

{ 
  shizzle: ['de-de', 'en-au', 'nl-nl'],
  geluk: ['de-de', 'en-au']
 }

Happy to do this, as it's not too much trouble, but then I think I am not sure what the purpose of .personal is to be honest :)

wooorm commented 4 years ago

Right, I think I understand.

Unfortunately, this is not something that exists in Hunspell, or projects based on it. There is no revision history built in that can be undone in steps.

Hunspell understands one language at a time. Such as English, based on a generated dictionary (see wooorm/dictionaries). Users do have some extra control: they can have an extra file, in the “personal” format, that adds or prohibits some words. On a Mac, for AppleSpell, that’s located in ~/Library/Spelling/en_US in this case. There is also an extra file that works in any language: ~/Library/Spelling/LocalDictionary. But when any information changes, in the language dictionary, or in one of the personal dictionaries, AppleSpell must be restarted! (related: https://github.com/wooorm/osx-learn).

For a solution, maybe re-initialize the instance when someone closes the dialog?

GrayedFox commented 4 years ago

Ah okay, thanks that has helped me understand. Personal seems like it is there so that a list of user words can be kept somewhere on disk and ported around, but it's only ever read (or set) once.

I worked around the issue and now use the following code (which has helped me find a new bug!) but I will close this issue and leave the workaround code here just in case future eye balls:

  // add word to user.ownWords and update existing spell checker instances
  addWord (word, languages) {
    const misspeltLangs = []
    languages.forEach(language => {
      const speller = this.spellers[language]
      if (!speller.correct(word)) {
        speller.add(word)
        misspeltLangs.push(language)
      }
    })
    if (misspeltLangs.length > 0) this.ownWords[word] = misspeltLangs
  }

  // remove word from user.ownWords and update existing spell checker instances
  removeWord (word) {
    this.ownWords[word].forEach(language => {
      if (this.spellers[language]) this.spellers[language].remove(word)
    })
    delete this.ownWords[word]
  }

The above code is just a sample for one way to manage adding and removing words from an nspell instance (if you are using multiple, as I am) in a class. Here, the class instantiates nspell instances for multiple languages and keeps track of which languages the custom word is actually misspelt in, so that calling user.removeWord() only removes the custom word from languages it was never originally correct in.