xunhuang / covid-19

MIT License
17 stars 16 forks source link

Prep for pulling data out of the bundle, doing this in two steps to avoid breaking page loads #74

Closed aschleck closed 4 years ago

aschleck commented 4 years ago

Copying and pasting what you said on the other PR:

if you run normalize_data.js and change where the file goes, and read them back it's going to be ok. I think. That said, perf wise, is fetching from the public that different from being in the bundle since the app really needs the file to run anyways. I don't have big objection, tiny concerned that the two step will serialized (load the bundle, execute the bundle, then fetch). However, one benefit is once the file is in the public html, we can point whoever that wants to use it to download it, instead of directing them to github.

Those are all great points!

I think that the bundle size matters for people on slow connections, for example if you throttle Chrome to "Slow 3G" speed it takes 29 seconds before the splash screen shows up (!) and then the app loads immediately after. As I understand it, users will bounce when they're stuck at a white page but may not bounce when something loads. I can't figure out how to get "node start" serve me a production build locally to determine the exact savings, but cutting off hundreds of kilobytes ought to do something. The other thing on my mind is that the longer you run these pipelines you'll end up with more and more data, so the problem will just become linearly worse.

The serialization is unfortunate, and it's made worse by having to parse the response of the XHR... But I think browsers are designed to do the parsing fast, and the time spent at the splash screen can't possibly account for a noticeable delay before the data is downloaded.


I compressed county_gps by hand (removing unneeded fields and whitespace), and copied AllData.json to AllData.min.json for now (updating normalize_data.js to actually minify it on subsequent runs.) I don't know your production set up, so it's safest to just leave the code alone for now and put these files here first. Then as a second step can update the code to look at those files and drop the data from the bundle.

xunhuang commented 4 years ago

these change look safe.

One cheap way to trim is cut out data from 30 days ago (at least for county level data). we don't show them anyways. On minimizing, isn't the build process already doing that? I will write a README for how the "devops" is run. But roughly this is how it goes: 1) I run a script on my computer: watch -n 300 "sh refreshdata.sh; node refresh.js" 2) Refresh pulls most data from everywhere, except for the JHU data, refresh.js does that 3) refresh.js fetches JHU, compares to see if any signifcant changes, if so, it runs normalized_data.js. Builds and deploys. 4) currently I don't make the devops pull from the tree directly as i do sanity check before deploying, especially for big changes. 5) the build/deploy command is "npm run deploy" (which does "npm run build" first).

aschleck commented 4 years ago

Ah I never really thought about how we were only showing 30d, yeah I guess that would cut it down. It seems kind of sad to do so though, since I assume you'd like to allow users to see the full window if we had infinite time to do everything? I was thinking if it becomes necessary it'd be easy to just split AllData.json into one national level one and a file per state.

The build process does compress json files that are require'd, but I don't think it compresses stuff in public. I'd believe you if you told me I was wrong though. I hope I am!

Re devops, my concern is if you have two machines running this code and a loadbalencer in front of them then one machine could be updated and serve the JS referencing this file to a user without the other machine having the data in public. So a user would be told to fetch from public/ and then their XHR ends up on the unupdated machine and they'll get a 404.

xunhuang commented 4 years ago

I don't think they will compress the stuff in public.

Oh, yeah, other approach would be to slice the files up into counties/state/national, and maybe make the loading slightly intelligent in loading what's needed on the first screen first, and then load the rest in background. things to think about.

I'd suggest prioritizing doing something cheap thing first, fix Ux, then deeper perf optimization.

On Sun, Apr 12, 2020 at 3:09 PM April Schleck notifications@github.com wrote:

Ah I never really thought about how we were only showing 30d, yeah I guess that would cut it down. It seems kind of sad to do so though, since I assume you'd like to allow users to see the full window if we had infinite time to do everything? I was thinking if it becomes necessary it'd be easy to just split AllData.json into one national level one and a file per state.

The build process does compress json files that are require'd, but I don't think it compresses stuff in public. I'd believe you if you told me I was wrong though. I hope I am!

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/xunhuang/covid-19/pull/74#issuecomment-612683528, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFSGF6QKZ7VRTILCI42G33RMI3Z3ANCNFSM4MGJZ5XA .