overleaf / spelling

The backend spellcheck API that performs spell checking for Overleaf
GNU Affero General Public License v3.0
9 stars 17 forks source link

Remove erroneous words from dictionary #79

Closed cn-ml closed 3 years ago

cn-ml commented 3 years ago

Steps to Reproduce

  1. (Accidentally) a word to the dictionary
  2. That's it. Now your spell checking is forever wrong with no way to return to default.

Expected Behaviour

There should be an option (other than mailing support) to remove incorrect entries from the dictionary. And if not than there should at least be an option to completely reset the dictionary to default. But the lack of this feature makes spell checking not only defective but also worse than no spell checking at all. This issue exists for years now in the web repository and is not cared about so im putting it here again to hopefully bump the importance of this issue.

Observed Behaviour

After adding an erroneous word to your dictionary overleaf now has a compromised dictionary and will always spellcheck your documents wrong with no way to correct it. This should be a major issue and the resolution of this issue cannot be to just contact support for help but there should be a reproducible resolution.

Context

"I added an incorrect entry to the dictionary and because I don't want to think twice about it every time I add something, it's gonna happen again. I'm pretty sure other people have this problem too. I really like overleaf, but this one is honestly making me want to sacrifice the cloud-convenience for a working spell-checker." - @paperbenni Happened to me as well.

Technical Info

No technical Info needed here, i think, but for completeness sake:

Analysis

Fixes:

Who Needs to Know?

gh2k commented 3 years ago

Thanks for opening this, and writing it up clearly.

The spelling API does actually contain an endpoint to remove words. There is no UI in Spelling, so I think as far as this repo is concerned there is no additional work to be done here.

The biggest problem we have here is that we run an in-process LRU cache because, without it, the load on the database becomes impractically heavy.

The problem this causes on the Overleaf SaaS product in production is that we have at least 10 (often more) spelling services running to handle the demand. These each have their own cache, which can take a while to expire. What happens in this case when we provide a UI for editing the spelling dictionary, is that it can take a long time (~10 hours or so) for changes to appear. This causes an awful lot of confusion, and is very much sub-optimal.

We are aware of the problem and we have this on our roadmap to fix this year. tbh we've had a lot of technical debt to clean up since we merged ShareLaTeX and Overleaf - which is why we haven't been able to divert resources to this yet. We recognise that it's an important issue for our users, and also causes us a headache from a support perspective as people have to write to us to have their words removed manually.

I'm going to close this ticket here, and copy this comment to the related one. I hope this helps clear this up -- I'm sorry that it's taken us so long to get onto it.

cn-ml commented 3 years ago

Okay, i just looked through the code and found the cache max-age and the unlearn endpoint that i have previously overseen. Thanks for clearing this issue up