mapbox / geobuf

A compact binary encoding for geographic data.
ISC License
967 stars 84 forks source link

geobuf.encode fails out of memory when encountering defective multi-MultiPolygons #104

Open stevage opened 5 years ago

stevage commented 5 years ago

I'm having a weird problem where geobuf.encode is failing due to insufficient heap memory. At first, I assumed it was just an inherent problem with a huge file, but actually my input file is not that big. The JSON I'm converting is only 13MB minified, well below the 256MB string limit, and miles below various other limits of 1.6GB or 8GB I've heard of.

My code:

const geobuf = require('geobuf');
const json = require('./data/elb.json')
console.log(JSON.stringify(json).length)
const pbf = geobuf.encode(json, new (require('pbf'))());
const out = fs.createWriteStream('out.geobuf');
out.write(Buffer.from(pbf));

Running like this:

$ node --max-old-space-size=8192 --trace-gc-verbose justgeobuf.js
[47104:0x102801c00] Shrinking page 0x3110fdb80000: end 0x3110fdc00000 -> 0x3110fdbc9000
[47104:0x102801c00] Shrinking page 0x31107b480000: end 0x31107b500000 -> 0x31107b485000
[47104:0x102801c00] Fast promotion mode: false survival rate: 70%
[47104:0x102801c00] Fast promotion mode: false survival rate: 93%
[47104:0x102801c00] Fast promotion mode: false survival rate: 40%
[47104:0x102801c00] Fast promotion mode: false survival rate: 93%
[47104:0x102801c00] Fast promotion mode: false survival rate: 48%
[47104:0x102801c00] Fast promotion mode: false survival rate: 98%
[47104:0x102801c00] Fast promotion mode: false survival rate: 49%
13265241
[47104:0x102801c00] Fast promotion mode: false survival rate: 0%
[47104:0x102801c00] Fast promotion mode: false survival rate: 26%
[47104:0x102801c00] Fast promotion mode: false survival rate: 62%
[47104:0x102801c00] Fast promotion mode: false survival rate: 33%
[47104:0x102801c00] Fast promotion mode: false survival rate: 57%
[47104:0x102801c00] Fast promotion mode: false survival rate: 69%
[47104:0x102801c00] Fast promotion mode: false survival rate: 4%
[47104:0x102801c00]     1195 ms: Heap growing factor 4.0 based on mu=0.970, speed_ratio=0 (gc=0, mutator=465190)
[47104:0x102801c00]     1195 ms: Grow: old size: 381619 KB, new limit: 1542588 KB (4.0)

<--- Last few GCs --->

[47104:0x102801c00]      865 ms: Scavenge 134.3 (155.2) -> 127.9 (157.2) MB, 18.5 / 0.0 ms  allocation failure
[47104:0x102801c00]      896 ms: Scavenge 136.7 (157.2) -> 132.1 (164.2) MB, 13.1 / 0.0 ms  allocation failure
[47104:0x102801c00]     1195 ms: Mark-sweep 473.7 (503.2) -> 372.9 (441.1) MB, 3.2 / 0.0 ms  (+ 1.8 ms in 5 steps since start of marking, biggest step 0.7 ms, walltime since start of marking 262 ms) finalize incremental marking via stack guard GC in old s

<--- JS stacktrace --->

==== JS stack trace =========================================

    0: ExitFrame [pc: 0x1d902ef0427d]
Security context: 0x3110578a06a9 <JSObject>
    1: populateLine(aka populateLine) [/Users/stevebennett/odev/play/which-boundary/node_modules/geobuf/encode.js:~205] [pc=0x1d902eff99da](this=0x3110fdb822e1 <undefined>,coords=0x31108908d791 <JSArray[50139473]>,line=0x31105e756a69 <JSArray[2961]>,closed=0x3110fdb82381 <true>)
    2: writeMultiPolygon(aka writeMultiPolygon) [/Users/stevebennett/odev/play/w...

FATAL ERROR: invalid array length Allocation failed - JavaScript heap out of memory
 1: node::Abort() [/usr/local/bin/node]
 2: node::FatalTryCatch::~FatalTryCatch() [/usr/local/bin/node]
 3: v8::internal::V8::FatalProcessOutOfMemory(char const*, bool) [/usr/local/bin/node]
 4: v8::internal::Heap::AllocateUninitializedFixedDoubleArray(int, v8::internal::PretenureFlag) [/usr/local/bin/node]
 5: v8::internal::Factory::NewFixedDoubleArray(int, v8::internal::PretenureFlag) [/usr/local/bin/node]
 6: v8::internal::(anonymous namespace)::ElementsAccessorBase<v8::internal::(anonymous namespace)::FastPackedDoubleElementsAccessor, v8::internal::(anonymous namespace)::ElementsKindTraits<(v8::internal::ElementsKind)4> >::GrowCapacity(v8::internal::Handle<v8::internal::JSObject>, unsigned int) [/usr/local/bin/node]
 7: v8::internal::Runtime_GrowArrayElements(int, v8::internal::Object**, v8::internal::Isolate*) [/usr/local/bin/node]
 8: 0x1d902ef0427d

It fails similarly with json2geobuf, even after modifying the shebang to increase max_old_space_size.

The source file elb.json is here: https://www.dropbox.com/s/0wakul4l5n3zihv/elb.json?dl=1

stevage commented 5 years ago

Hmm, I think the problem is specific to multi polygons. That file contains all 47 federal electoral districts in NSW, Australia, each represented as a multipolygon. (Most contain a single polygon, one contains 95).

This file contains the same data, but as 276 individual polygons, with the properties duplicated on them. geobuf.encode processes it just fine.

stevage commented 5 years ago

I now see that my input GeoJSON is actually defective, and this is the cause of the failure. There is one Multi-MultiPolygon.

mourner commented 5 years ago

@stevage thanks for the update! Still marking this as a bug because it should error instead of going out of memory. If you're up to fixing this, I'll welcome a PR :)