Lazy-load dictionaries, integrate with language packs

krassowski commented 3 years ago

Currently we pre-load all dictionaries as they are a part of the pre-build bundle. This is wasteful as an average user does not use more than 1-3 languages at once. This is also highlighted by the asset size warning:

WARNING in asset size limit: The following asset(s) exceed the recommended size limit (244 KiB).
This can impact web performance.
Assets: 
  fea358c2059d3d479bb1194ad85cfbb1.dic (538 KiB)
  ceef28c58c994145ee07fcb604d8af42.dic (539 KiB)
  c879c998bbf83faea2aa1a3cb1f11ae9.dic (538 KiB)
  e04d47f0a6d463ec17cb2285c146347a.dic (541 KiB)
  d62f8312e4341a44bc699e673f4c10ab.dic (4.22 MiB)
  0ad6978d8b20fc95ae666fa5a032b5e6.dic (4.22 MiB)
  7085782c2e6cd983e926e46bad2e757f.dic (4.21 MiB)
  72527085f31d8bafb0884755ab6ecab9.aff (251 KiB)
  c6e54673e081035a39b777d7952540b3.dic (1.05 MiB)
  cd737cb950f32dd93445f2a166c36a21.dic (841 KiB)
  b9550cf15a5afc8a005049462bd40aef.dic (446 KiB)

A few months back I discussed how we could integrate the dictionaries with language packs with @goanpeca (who consistently nudged me on this - thanks ;)). We can now start moving towards that goal as the extension is in the new organization and #48 ports it to use federated extensions toolset. Opening this issue to track progress and exchange ideas.

One option would be adding a server extension endpoint like in: https://github.com/jupyterlab/extension-cookiecutter-ts/blob/3.0/%7B%7Bcookiecutter.python_name%7D%7D/src/index.ts

ocordes commented 3 years ago

There is also another issue with these files. I tried to use some italian dictionary but Typo.js is not able to load this file within seconds. The problem is that the hunspell rules are so clever that you can create a big word list with some lines ... I want to check how we can circumvent this problem. Some ideas are:

transferring only unrolled word lists to the browser space (zipped to save reduce transfer bandwidth)
using API call to the server to check for words which can use fast code to handle bigger hunspell dictionaries

Anyway we need a better solution. BTW, I was in contact with the main author of Typo.js and he is not actively working on the package. I send him a pull request two month ago which he wants to work on but there is no change so far! (you can check this BUG "hEllo" is marked as correct ... ;-) So we should keep this in mind ...

krassowski commented 3 years ago

I just uploaded first federated version to PyPI, it weights >10MB: https://test.pypi.org/project/jupyterlab-spellchecker/0.4.0/

So this is also an argument for splitting the dictionaries into separate packages... I guess we could use entry-points so that the user could do:

pip install jupyterlab-spellchecker[german]
pip install jupyterlab-spellchecker[english]
pip install jupyterlab-spellchecker[all]

etc.

I will be thinking how to make it easy and how fetching from backend fits into all of this.

jupyterlab-contrib / spellchecker

Lazy-load dictionaries, integrate with language packs #49