mapbox / geojson-merge

Merge multiple GeoJSON files into one FeatureCollection.
ISC License
233 stars 33 forks source link

Fix stream processing #35

Open mkieselmann opened 4 years ago

mkieselmann commented 4 years ago

While trying to merge a large number of GeoJson files in streaming mode I encountered 2 bugs. 1) The merge file did not contain all content of the separate files. This seems to be related to parallel stream processing. At least serializing the processing of the separate file streams fixes it.

2) The contents of the first file argument were missing in the merged file. This was related to the way command line arguments are parsed. -s was parsed as key=value which leads to the fact that the first file was interpreted as value for the s option.

I've tested the changes on macOS with node v13.5.0.

zingi commented 4 years ago

I can confirm that I also had issues while trying to merge several (30+) geojson files. The output geojson only containd a fraction of what it should have.

Furthermore I got a memory leak warning:

(node:6825) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 drain listeners added to [Stream]. Use emitter.setMaxListeners() to increase limit
    at _addListener (events.js:268:17)
    at Stream.addListener (events.js:284:10)
    at Stream.pipe (internal/streams/legacy.js:30:8)
    at <mypath>/app/node_modules/@mapbox/geojson-merge/index.js:59:14
    at Array.forEach (<anonymous>)
    at Object.mergeFeatureCollectionStream (<mypath>/app/node_modules/@mapbox/geojson-merge/index.js:56:12)
    at <mypath>/app/rasterToVector.js:237:39
    at new Promise (<anonymous>)
    at mergeMultipleGeojsonFiles (<mypath>/app/rasterToVector.js:236:10)
    at AsyncFunction.convertMultiple [as multiple] (<mypath>/app/rasterToVector.js:91:13)
(node:6825) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 error listeners added to [Stream]. Use emitter.setMaxListeners() to increase limit
    at _addListener (events.js:268:17)
    at Stream.addListener (events.js:284:10)
    at Stream.pipe (internal/streams/legacy.js:64:8)
    at <mypath>/app/node_modules/@mapbox/geojson-merge/index.js:59:14
    at Array.forEach (<anonymous>)
    at Object.mergeFeatureCollectionStream (<mypath>/app/node_modules/@mapbox/geojson-merge/index.js:56:12)
    at <mypath>/app/rasterToVector.js:237:39
    at new Promise (<anonymous>)
    at mergeMultipleGeojsonFiles (<mypath>/app/rasterToVector.js:236:10)
    at AsyncFunction.convertMultiple [as multiple] (<mypath>/app/rasterToVector.js:91:13)
(node:6825) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 close listeners added to [Stream]. Use emitter.setMaxListeners() to increase limit
    at _addListener (events.js:268:17)
    at Stream.addListener (events.js:284:10)
    at Stream.pipe (internal/streams/legacy.js:86:8)
    at <mypath>/app/node_modules/@mapbox/geojson-merge/index.js:59:14
    at Array.forEach (<anonymous>)
    at Object.mergeFeatureCollectionStream (<mypath>/app/node_modules/@mapbox/geojson-merge/index.js:56:12)
    at <mypath>/app/rasterToVector.js:237:39
    at new Promise (<anonymous>)
    at mergeMultipleGeojsonFiles (<mypath>/app/rasterToVector.js:236:10)
    at AsyncFunction.convertMultiple [as multiple] (<mypath>/app/rasterToVector.js:91:13)

I tried the fixed code of this pull-request and it solved the issue.

zingi commented 4 years ago

Just for the sake of good written code, you could remove the variable declaration, because the variable is actually unused:

function mergeFeatureCollectionStream (inputs) {
    const out = geojsonStream.stringify();
    const streams = inputs.map(file => fs.createReadStream(file));
    new StreamConcat(streams);
        .pipe(geojsonStream.parse())
        .pipe(out);
    return out;
}
mkieselmann commented 4 years ago

Thanks for the suggestion. I have updated the code.

kevinduke10 commented 4 years ago

I can confirm that I also had issues while trying to merge several (30+) geojson files. The output geojson only containd a fraction of what it should have.

Furthermore I got a memory leak warning:

(node:6825) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 drain listeners added to [Stream]. Use emitter.setMaxListeners() to increase limit
    at _addListener (events.js:268:17)
    at Stream.addListener (events.js:284:10)
    at Stream.pipe (internal/streams/legacy.js:30:8)
    at <mypath>/app/node_modules/@mapbox/geojson-merge/index.js:59:14
    at Array.forEach (<anonymous>)
    at Object.mergeFeatureCollectionStream (<mypath>/app/node_modules/@mapbox/geojson-merge/index.js:56:12)
    at <mypath>/app/rasterToVector.js:237:39
    at new Promise (<anonymous>)
    at mergeMultipleGeojsonFiles (<mypath>/app/rasterToVector.js:236:10)
    at AsyncFunction.convertMultiple [as multiple] (<mypath>/app/rasterToVector.js:91:13)
(node:6825) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 error listeners added to [Stream]. Use emitter.setMaxListeners() to increase limit
    at _addListener (events.js:268:17)
    at Stream.addListener (events.js:284:10)
    at Stream.pipe (internal/streams/legacy.js:64:8)
    at <mypath>/app/node_modules/@mapbox/geojson-merge/index.js:59:14
    at Array.forEach (<anonymous>)
    at Object.mergeFeatureCollectionStream (<mypath>/app/node_modules/@mapbox/geojson-merge/index.js:56:12)
    at <mypath>/app/rasterToVector.js:237:39
    at new Promise (<anonymous>)
    at mergeMultipleGeojsonFiles (<mypath>/app/rasterToVector.js:236:10)
    at AsyncFunction.convertMultiple [as multiple] (<mypath>/app/rasterToVector.js:91:13)
(node:6825) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 close listeners added to [Stream]. Use emitter.setMaxListeners() to increase limit
    at _addListener (events.js:268:17)
    at Stream.addListener (events.js:284:10)
    at Stream.pipe (internal/streams/legacy.js:86:8)
    at <mypath>/app/node_modules/@mapbox/geojson-merge/index.js:59:14
    at Array.forEach (<anonymous>)
    at Object.mergeFeatureCollectionStream (<mypath>/app/node_modules/@mapbox/geojson-merge/index.js:56:12)
    at <mypath>/app/rasterToVector.js:237:39
    at new Promise (<anonymous>)
    at mergeMultipleGeojsonFiles (<mypath>/app/rasterToVector.js:236:10)
    at AsyncFunction.convertMultiple [as multiple] (<mypath>/app/rasterToVector.js:91:13)

I tried the fixed code of this pull-request and it solved the issue.

Just for the sake of good written code, you could remove the variable declaration, because the variable is actually unused:

function mergeFeatureCollectionStream (inputs) {
    const out = geojsonStream.stringify();
    const streams = inputs.map(file => fs.createReadStream(file));
    new StreamConcat(streams);
        .pipe(geojsonStream.parse())
        .pipe(out);
    return out;
}

Same issue here with 1.2GB in 72 files. This fix cleared it right up!