srawlins / timezone

Time zone database and time zone aware DateTime object for Dart.
BSD 2-Clause "Simplified" License
102 stars 53 forks source link

Use unicode literals instead of inefficient base64 #109

Closed untp closed 2 years ago

untp commented 3 years ago

Dart strings are UTF-16 encoded, any data can be embedded as unicode literals. Converting to base64 is inefficient, because it increases binary size.

I tested a simple example code for comparing old and new implementation, using dart compile exe; Baseline file size is 4,406,528 bytes. (Baseline created with replacing embedded data string to empty string) Base64 encoding file size is 4,922,792 bytes. (Current implementation) Unicode literals file size is 4,790,688 bytes. (This PR)

Using unicode literals decreases binary size and it is the most efficient method for embedding data.

Also I removed encode.dart, because encode.dart is duplicate of encode_dart.dart.

rakudrama commented 2 years ago

Unicode literals are inefficient when compiling to JavaScript. The UTF-16 code units in strings are mostly encoded as \uHHHH escapes. This gives a coding efficiency of <3 bits per byte, compared with 6 bits per bye in base64. The effect of this is a Dart program compiled to JavaScript increases by 500kB.

It is also slower in JavaScript - the base64 code takes 45ms to initialize, the new code 57ms. On the VM it is possible to make a Uint16List from a String very quickly. In JavaScript there is no easy way other than iterating over the String's code units to copy them to a List, at which point the conversion to Uint16List is as slow as the base64 decoder.


I think there is also a problem of the size of the data regardless of the encoding. I don't think 45ms is reasonable for a synchronous initialization. 45ms is much more than an animation frame, so it will cause stuttering in the UI. Does the basic data format allow initialization of parts of the data on demand? (e.g. parse an index and only parse a timezone when requested).