opentraffic / datastore

OTv2: centralized ingest and aggregation of anonymous traffic data
GNU Lesser General Public License v3.0
28 stars 12 forks source link

Consider serving tiles where gzip is enabled #47

Closed louh closed 7 years ago

louh commented 7 years ago

From a test I did earlier today, a server that gzips tiles on-the-fly gives us a >90% savings on the transferred file size:

screenshot 2017-07-27 16 28 51

If you look on the right side, the top number is the actual size transferred over the wire; the bottom number is the uncompressed size. Note that these were the original JSON files with unmangled properties. I did a similar test with the mangled properties, and the resulting file sizes remained pretty close, because of the nature of the gzip algorithm.

This is significantly better for download performance (but says nothing about memory or processing performance yet). What this means is that a user had to wait about ~3 minutes before to download 600MB, and about ~20-30 seconds to do 60MB. This is excellent. My goal was to get our download sizes to about 60MB per request. Even if we were to spend a bunch of time rejiggering how we roll up data, or determining what properties to include, we would probably, at best, get us 50% of the way there. By gzipping the tiles, we get 90% savings immediately with only a tweak in the infrastructure.

Please note that this does not mean serve files that were gzipped manually. This is because the browser will automatically uncompress files that were transmitted over the wire in gzip encoding. If the files were transmitted as gzipped files, the browser does not automatically uncompress it, and then you would require something in JavaScript to parse the file and gzip client side, which is not optimal. Therefore, we must have the server serve files with gzip compression turned on, which is not the same as having the export process create gzipped files.

In summary, here's my recommendations for now:

drewda commented 7 years ago

CloudFront can only dynamically compress files that are <10Mb, so that won't help with these large files.

But what we can probably do with S3 is:

  1. gzip the files before uploading
  2. Set the content-type metadata on each file to application/json
  3. Set the content-encoding metadata on each file to gzip

(See http://www.rightbrainnetworks.com/blog/serving-compressed-gzipped-static-files-from-amazon-s3-or-cloudfront/ )

drewda commented 7 years ago

@louh try fetching https://speed-extracts.s3.amazonaws.com/2017/0/0/002/415.json I just manually set the headers on that file.

drewda commented 7 years ago

Closing this, since PBF data exports from Datastore are gzip'ed.