Progressive download/parsing/indexing of codepoints

msiebuhr commented 12 years ago

The application halts slow machines/devices quite a lot, so perhaps we should split the index into smaller parts - possibly with various-size chunks, so we can adapt to faster/slower machines and network connections. Eg. naming data-FROM%-TO%.json:

#5% chunks
data-0-5.json
data-5-10.json
…

#10 % chunks
data-0-10.json
data-10-20.json
…

#25 % chunks
data-0-25.json
data-25-50.json
…

#50 % chunks
data-0-50.json
data-50-100.json

Then the client could start out downloading data-0-10.json and parse it. If that takes to long, degrade to 5% chunks, and if It's fast, upgrade to 25%-chunks.

We'd have to have some more data lying around (about 2MB per size), and - more difficult - figure out a dynamic download client.

Munter commented 12 years ago

An alternative could be offloading the heavy lifting stuff to web workers to keep the interface responsive.

msiebuhr commented 8 years ago

Another way could be to include some top percentage of the codes in the initial download.

misc_pictographs - 12% of all popups, but 1,8% of all codepoints
ascii - 11% of all popups, but 0,3% of all codepoints
misc_symbols - 7% of all popups, but 0,6% of all codepoints

Picking all these out from the main data-set weighs in at 11KB gzipped (66KB plain), which would still be quite a win.

jq '[.[] | select(.b == "ascii" or .b == "misc_symbols" or .b == "misc_pictographs")]' -c data.json  | gzip | wc -c

msiebuhr commented 8 years ago

BTW. misc_pictographs alone would weigh in a 7KB gzipped.

Considering the background image is 11KB compressed and the JS-bundle is 45KB, I think we'd be OK all of the proposed subsets.

msiebuhr / charcod.es

Progressive download/parsing/indexing of codepoints #16