jupyterlab-contrib / spellchecker

Spellchecker for JupyterLab notebook markdown cells and file editor.
BSD 3-Clause "New" or "Revised" License
204 stars 20 forks source link

Suggest words from personal dictionary, add custom dictionaries #75

Open krassowski opened 3 years ago

krassowski commented 3 years ago

Typo.js has its limitations (issues with German, no support for adding custom words, no support for combining multiple dictionaries (e.g. English + medical English)) and we may want to consider alternatives. The two packages I find promising are:

I think that moving to nspell might be a good idea. I will leave this up for discussion for two weeks to get feedback before making an attempt, but if anyone feels like doing it earlier please do feel welcome to do so.

ocordes commented 3 years ago

Typo.js has also problems with italian dictionaries or with all dictionaries which have a complex structure like many prefixes. I will check nspell how it perfoms with such dictionaries. It may be a very good alternative.

richtong commented 2 years ago

I was just wondering about this, are there custom dictionaries at all, I'm adding a zillion things to the Ignore list, but these remina highlighted at least with Jupyterlab with the latest spellchecker. I can see the ignored words in the json config file, so I'm wondering is there a mechanism to add custom dictionaries in the right format. With VI, there is a simple text list in a spell file, is that what I should be doing here, add a local dictionary?

krassowski commented 2 years ago

Based on this issue still being open, there isn't.

I'm adding a zillion things to the Ignore list, but these remina highlighted at least with Jupyterlab with the latest spellchecker

@richtong could you provide an example?

richtong commented 2 years ago

Sure, the current typo.us doesn't handle abbreviations for numbers so, numbers and company names. As a workaround I guess I could create my own hunspell, but it looks like multiple dictionaries isn't supported, so I would need to add on to the en_US or some other dictionary, but here are some common uses:

1M
100M
2B
300K
1000x
Doesn't

You get the idea. And the dictionary doesn't include various technical abbreviations like:

APIs
Web3
dApps
IoT (Internet of Things)
NLP (natural language processing)
2D (although 2-D works its not that common in our field)
4D (Four dimensional world that is you can go back and forth in time)
sUAVs (small unmanned aerial vehicles)
Convolutional
Downsampling
Scalable
Photogrammetry
Dataset
Raycasting
Lossy
Atmos
VR
LiDAR
2FA
RFID
Biometrics

Of course because Jupyter (at least for us) is for scientific and business stuff, it there are a lot of these. Also, if you put it on the Ignore list at least with the latest release, you still get the highlighted as a misspelling even though it is on the ignore list so my list has lots of dupes before I figure this out.

And of course various company names

Zipline
Quota
SDL
Zillow
krassowski commented 2 years ago

I mean that this is not how the ignore list is supposed to work. It should have prevented things from getting highlighted. If it does not please provide a specific example and open a new issue.

jangenoe commented 7 months ago

FYI: Zettlr went from typo.js (v1.0.0) to nspell (v1.1.0) to nodehun (current Zettlr version)

The file changes in the commit messages (under the to links above ) seem to indicate that the required changes are not extensive.