Currently the dictionary is delivered to the frontend as a 3.4MB gzipped JSON list of entries. For search, this requires iterating through each entry to find matches (121K total). This is less than ideal, but has worked so far because modern browsers are fast.
What would be better is:
During data generation, precompute a trie of 1-grams or 2-grams of all searchable information (simplified, pinyin, definition)
Ship that to the frontend and use it for search
Also, optionally (and this should be a separate bug):
Cut down the number of entries shipped to the frontend by default to the most common 10/20/50K
Design a way for the frontend to access information that wasn't included in the initial payload
cc @jagLas
Before embarking on a huge rewrite, there is already a project that has built a ton of infrastructure around parsing and processing CC-CEDICT, so it'd be worth seeing if any of that could be leveraged here:
https://github.com/mreichhoff/HanziGraph/tree/main/scripts
Currently the dictionary is delivered to the frontend as a 3.4MB gzipped JSON list of entries. For search, this requires iterating through each entry to find matches (121K total). This is less than ideal, but has worked so far because modern browsers are fast.
What would be better is:
Also, optionally (and this should be a separate bug):
cc @jagLas
Before embarking on a huge rewrite, there is already a project that has built a ton of infrastructure around parsing and processing CC-CEDICT, so it'd be worth seeing if any of that could be leveraged here: https://github.com/mreichhoff/HanziGraph/tree/main/scripts