retextjs / retext-spell

plugin to check spelling
https://unifiedjs.com
MIT License
73 stars 16 forks source link

Cache suggestions array #21

Closed DeanGaffney closed 3 years ago

DeanGaffney commented 3 years ago

Subject of the feature

Start caching more information such as the array of suggestions returned by nspell so that when a word is pulled from the cache the list of suggestions is available to iterate over.

Problem

If suggestions have already been created for a specific word the reason string is placed in the cache and if that word is checked again the reason string is taken from the cache rather than using nspell. When the cache is used it only contains the reason string and not the array of suggestions.

It would be useful if the cache contained the array of suggestions too so that both the reason and the original array of suggestions are present on the returned Message object.

If we look at the suggestions when we misspell the word hello, the first time we can see the list of suggestions is populated on the message.expected attribute like so:

{
  "data": {},
  "messages": [
    {
      "message": "`helo` is misspelt; did you mean `hello`, `help`, `helot`, `halo`, `held`, `hell`, `helm`, `hero`?",
      "name": "1:1-1:5",
      "reason": "`helo` is misspelt; did you mean `hello`, `help`, `helot`, `halo`, `held`, `hell`, `helm`, `hero`?",
      "line": 1,
      "column": 1,
      "location": {
        "start": {
          "line": 1,
          "column": 1,
          "offset": 0
        },
        "end": {
          "line": 1,
          "column": 5,
          "offset": 4
        }
      },
      "source": "retext-spell",
      "ruleId": "helo",
      "fatal": false,
      "actual": "helo",
      "expected": [
        "hello",
        "help",
        "helot",
        "halo",
        "held",
        "hell",
        "helm",
        "hero"
      ]
    }
  ],
  "history": [],
  "cwd": "/",
  "contents": "helo"
}

If the word helo is checked again the reason string is pulled from the cache and the returned Message object looks like this:

{
  "data": {},
  "messages": [
    {
      "message": "`helo` is misspelt; did you mean `hello`, `help`, `helot`, `halo`, `held`, `hell`, `helm`, `hero`?",
      "name": "1:1-1:5",
      "reason": "`helo` is misspelt; did you mean `hello`, `help`, `helot`, `halo`, `held`, `hell`, `helm`, `hero`?",
      "line": 1,
      "column": 1,
      "location": {
        "start": {
          "line": 1,
          "column": 1,
          "offset": 0
        },
        "end": {
          "line": 1,
          "column": 5,
          "offset": 4
        }
      },
      "source": "retext-spell",
      "ruleId": "helo",
      "fatal": false,
      "actual": "helo",
      "expected": []
    }
  ],
  "history": [],
  "cwd": "/",
  "contents": "helo"
}

In the cached object the expected array is missing from message object.

Expected behavior

In the case where a cached result is used the npsell suggestions array should also be cached so that the message.expected array is always populated like so:

{
  "data": {},
  "messages": [
    {
      "message": "`helo` is misspelt; did you mean `hello`, `help`, `helot`, `halo`, `held`, `hell`, `helm`, `hero`?",
      "name": "1:1-1:5",
      "reason": "`helo` is misspelt; did you mean `hello`, `help`, `helot`, `halo`, `held`, `hell`, `helm`, `hero`?",
      "line": 1,
      "column": 1,
      "location": {
        "start": {
          "line": 1,
          "column": 1,
          "offset": 0
        },
        "end": {
          "line": 1,
          "column": 5,
          "offset": 4
        }
      },
      "source": "retext-spell",
      "ruleId": "helo",
      "fatal": false,
      "actual": "helo",
      "expected": [
        "hello",
        "help",
        "helot",
        "halo",
        "held",
        "hell",
        "helm",
        "hero"
      ]
    }
  ],
  "history": [],
  "cwd": "/",
  "contents": "helo"
}

Alternatives

Note

I am willing to do this work, but I just wanted to open an issue first in case there was any reason this should not be done. Thanks!

wooorm commented 3 years ago

Interesting stuff, thanks for offering to work on it and for raising such a wonderful issue. There is some caching already in the code here. So that's not to your liking? Nspell could really use some love and attention soon, as well!

DeanGaffney commented 3 years ago

Hi @wooorm thank you for the quick reply.

I looked through the code and I can see that the only thing that is currently cached is the reason string on the message object, the code does not currently cache the suggestions, which I found here https://github.com/retextjs/retext-spell/blob/main/index.js#L174.

I was wondering if you would be open to also caching the suggestions array from nspell so that the message.expected attribute always contains suggestions even when the result is returned from the cache.

An example of what the cache currently looks like

const word = "helo";
const cache = {};

cache[word] = "`helo` is misspelt; did you mean `hello`, `help`, `helot`, `halo`, `held`, `hell`, `helm`, `hero`?";

What I am proposing we modify the cache to be:

const word = "helo";
const cache = {};
cache[word] = {
    "reason": "`helo` is misspelt; did you mean `hello`, `help`, `helot`, `halo`, `held`, `hell`, `helm`, `hero`?",
    "suggestions": ["hello", "help", "helot", "held", "hell", "helm", "hero"]
};
wooorm commented 3 years ago

Oh, welp! Yes, I think that is indeed the intention of the code! And caching the list of suggestions instead of the reason is much better: generating the reason from them is trivial in performance compared to generating spell suggestions!

Would appreciate a PR to fix it!

DeanGaffney commented 3 years ago

@wooorm No problem, I have opened a PR for caching both the reason and nspell suggestions here https://github.com/retextjs/retext-spell/pull/22

wooorm commented 3 years ago

Sweet! Released in 4.0.1!!