mapbox / geobuf

A compact binary encoding for geographic data.
ISC License
967 stars 84 forks source link

Performance issue: Points from geobuf polygons use more array capacity than needed, wasting memory. #122

Open TysonAndre opened 2 years ago

TysonAndre commented 2 years ago

Arrays in nodejs need to be able to quickly add elements without resizing frequently, so they have both a size and a capacity.

For example, in the geo-tz module (providing time zone data for the entire world), geobuf will create 'Polygon' objects with readLinePart, and those arrays will be created with size 2, and excess capacity(16) that is never freed.

Replacing coords.push(p) with coords.push(p.slice()) in node_modules/geobuf/decode.js resulted in memory use of loading the entire quad tree from 1,282,134,016 to 528,089,088 for me (1.28GB to 0.53GB) in 64-bit node.js - the latter does not have excess capacity

From babel/issues/6233

In V8, an empty array gets a buffer of 16 elements. This gives it a little bit of room to grow without needing reallocation. Once you add a 17th element, the buffer expands by 50%. This formula continues after that every time reallocation is needed.

Note that new Array(size) would be worse for performance(runtime) due to js needing more arrays to represent arrays with mixes of types and the optimizer not being able to generate more efficient code. That should be avoided.

Related to https://github.com/evansiroky/node-geo-tz/issues/131