mapbox / geobuf

A compact binary encoding for geographic data.
ISC License
967 stars 84 forks source link

Add compress function to return object with reduced memory usage #123

Open TysonAndre opened 2 years ago

TysonAndre commented 2 years ago

Objects are modified in place, arrays are replaced with an array that only has exactly the amount of capacity needed.

This is useful in cases where the polygons will be used for a long time. By default, arrays are reserved with extra capacity that won't be used. (The empty array starts with a capacity of 16 elements by now, which is inefficient for decoded points of length 2) slice() allocates a new array, seemingly with shrunken capacity according to process.memoryUsage.

This has an optional option to deduplicate identical points, which may be useful for collections of polygons sharing points as well as for calling compress multiple times with different objects. It's only safe for read-only uses, so it is disabled by default.

For example, in node-geo-tz issue 131, I saw this change to memory usage and decoding time on Linux (time zone polygons for the entire world map). This is useful for long-running processes that repeatedly use the objects.

  1. No Override: 1.280 GB (1.8 seconds)
  2. Defaults for cache(no numericArrayCache): 0.708 GB (3.4 seconds)
  3. Adding the second Map (numericArrayCache): 0.435 GB (6.7 seconds)

Note that if the object is not kept around, there's wouldn't be a reason to call compress.

What are your thoughts about adding an optional boolean to decode(pbf, compressData = false), and calling compress if compressData === true) (strict equality to guard against accidentally passing extra parameters from Array.prototype.forEach)?

Closes #122