mbloch / mapshaper

Tools for editing Shapefile, GeoJSON, TopoJSON and CSV files
http://mapshaper.org
Other
3.66k stars 529 forks source link

Can't run multiple -each commands on large files #603

Closed colindm closed 7 months ago

colindm commented 9 months ago

When I import a two geojson files, 1.1gb and 150mb in size, and run several dozen -each commands that I've generated it returns this error:

RangeError [ERR_OUT_OF_RANGE]: The value of "length" is out of range. It must be >= 0 && <= 2147483647. Received 2372053989 at Object.writeSync (node:fs:922:5) at Object.writeFileSync (node:fs:2255:26) at module.exports [as writeFileSync] (/usr/local/lib/node_modules/mapshaper/node_modules/rw/lib/rw/write-file-sync.js:14:8) at cli.writeFile (/usr/local/lib/node_modules/mapshaper/mapshaper.js:11168:10) at /usr/local/lib/node_modules/mapshaper/mapshaper.js:11280:13 at Array.forEach (<anonymous>) at _writeFiles (/usr/local/lib/node_modules/mapshaper/mapshaper.js:11267:15) at writeFiles (/usr/local/lib/node_modules/mapshaper/mapshaper.js:11241:12) at /usr/local/lib/node_modules/mapshaper/mapshaper.js:332:17 at new Promise (<anonymous>) { code: 'ERR_OUT_OF_RANGE' }

Here's the script:

-i ./CA/input/precincts2002_2020.geojson name=precincts2002_2020 -i ./CA/intermediate/testVotes.geojson name=censusBlocks -join target=censusBlocks source=precincts2002_2020 fields=* largest-overlap -rename-fields CNTYVTDprecincts2002_2020=CNTYVTD -rename-fields precincts2002_2020Pop=POP20 -each 'G02HouD=round(G02HouD*precincts2022_2_popshare * 100) / 100','G02HouR=round(G02HouR*precincts2022_2_popshare * 100) / 100','G02GovD=round(G02GovD*precincts2022_2_popshare * 100) / 100','G02GovR=round(G02GovR*precincts2022_2_popshare * 100) / 100', .... .... about 50 more -each commands .... 'G10ConO=round(G10ConO*precincts2022_2_popshare * 100) / 100' target=censusBlocks -o force ./CA/intermediate/precincts2002_2020_popTEST.geojson

I assume it's an out of memory error but running the same script with mapshaper-xl and node --max_old_space_size=8192 doesn't change anything. However, if I run that same script with files that are identical in structure but smaller the script runs fine.

mbloch commented 9 months ago

Hi! This is not an out-of-memory error. The problem is that the output is too large to fit in a Buffer. The commands that you ran increased the size of the output file to more than 2GB. There are a few things you can do to reduce the output file size. One is to reduce the precision in your output coordinates, by adding precision=0.000001 (for example) to the -o command. Another is to add fewer new attributes to each feature, or reduce the length of your property names. As an aside, mapshaper's round() function takes a second argument for the number of decimals in the output, so you can replace round(val * 100) / 100 with round(val, 2).

colindm commented 9 months ago

Thanks for the response, I tried running it with the precision argument when outputting and I got a different error, one that more closely resembles and out of memory error:

`Allocating 8 GB of heap memory [i] Importing: ./CA/input/precincts2002_2020.geojson [i] Importing: ./CA/intermediate/testVotes.geojson [join] Joined data from 25,586 source records to 519,387 target records [join] 21/25607 source records could not be joined

<--- Last few GCs --->

[69511:0x7f94c884d000] 587033 ms: Scavenge (reduce) 7574.8 (7705.4) -> 7573.9 (7705.4) MB, 10.3 / 0.0 ms (average mu = 0.381, current mu = 0.371) allocation failure; [69511:0x7f94c884d000] 587091 ms: Scavenge (reduce) 7574.8 (7705.4) -> 7573.8 (7705.4) MB, 11.3 / 0.0 ms (average mu = 0.381, current mu = 0.371) allocation failure; [69511:0x7f94c884d000] 587154 ms: Scavenge (reduce) 7574.8 (7705.4) -> 7573.8 (7705.4) MB, 11.1 / 0.0 ms (average mu = 0.381, current mu = 0.371) allocation failure;

<--- JS stacktrace --->

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory 1: 0x10332dfa5 node::Abort() [/usr/local/bin/node] 2: 0x10332e195 node::OOMErrorHandler(char const, bool) [/usr/local/bin/node] 3: 0x1034b081c v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate, char const, bool) [/usr/local/bin/node] 4: 0x1036750b5 v8::internal::Heap::FatalProcessOutOfMemory(char const) [/usr/local/bin/node] 5: 0x10367969a v8::internal::Heap::RecomputeLimits(v8::internal::GarbageCollector) [/usr/local/bin/node] 6: 0x103675da8 v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::internal::GarbageCollectionReason, char const, v8::GCCallbackFlags) [/usr/local/bin/node] 7: 0x103672dd0 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/usr/local/bin/node] 8: 0x103671e02 v8::internal::Heap::HandleGCRequest() [/usr/local/bin/node] 9: 0x103612be1 v8::internal::StackGuard::HandleInterrupts() [/usr/local/bin/node] 10: 0x103a721d3 v8::internal::Runtime_StackGuard(int, unsigned long, v8::internal::Isolate*) [/usr/local/bin/node] 11: 0x103e60b79 Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_NoBuiltinExit [/usr/local/bin/node]`

But I did run this script with node --max_old_space_size=8192 $(which mapshaper-xl) -verbose so I'm not sure why it ran out of memory. The files it's dealing with are combined just under 2GB so it seems strange to me that it would run out of memory if it was actually being allocated 8GB. Maybe I'm missing something but it seems like mapshaper-xl isn't actually getting more memory on my machine? Any way to test this?

mbloch commented 9 months ago

I'm not sure what the best way to test how much heap memory was actually allocated.... Before we go that route, I'd like you to change the way you're running mapshaper. The --max_old_space_size trick only works with mapshaper, not mapshaper-xl, so your command should be:

node --max_old_space_size=8192 $(which mapshaper)

You can accomplish the same thing running mapshaper-xl directly. Say you wanted to give it 32gb, you'd go:

mapshaper-xl 32gb ...

Also, with a file that large, it's possible that even 8gb is not enough (generally the RAM required to process a file is much larger than the size of the file itself).

colindm commented 9 months ago

I ran the command with mapshaper-xl 14gb (I have 16GB of ram on my laptop) and I got the error I got at the beginning:

RangeError [ERR_OUT_OF_RANGE]: The value of "length" is out of range. It must be >= 0 && <= 2147483647. Received 2355048957

Maybe I'm misreading that error but doesn't it imply that only 2gb was allocated? Assuming 2147483647 refers to bytes it'd equate to 2.14gb. And I'm running it with the round(val, 2) suggestion you made, not that I expected it to make a difference but it's worth mentioning, I'll use it in the future regardless since it's much cleaner.

mbloch commented 9 months ago

Hi! A gigabyte is 1,073,741,824 bytes, and 2,147,483,648 bytes is the maximum length of a Buffer in Node. This error is occurring because mapshaper places the entire contents of an output file in a Buffer before writing to disk. Your data does not quite fit. Did you try reducing the precision of the output coordinates using the -o precision= option?

[update] Sorry, you said that this is the error you got at the beginning... so just to be sure, are you seeing this error when you first run your commands, not at the end when mapshaper is writing the output file? If that's the case, you could tell me what version of mapshaper you're running? (mapshaper -v)

colindm commented 9 months ago

This happens at the end when it's writing the output file. I got it to work by adding -simplify 15% keep-shapes before outputting. It's not ideal but it'll do for now. Is there any way to increase the nodeJS buffer size so it can output larger files?

I'm running mapshaper 0.6.24

mbloch commented 9 months ago

Unfortunately, the maximum buffer size is fixed, there's no way to increase it at run time. This problem could be solved by streaming the output to disk, or writing the output in chunks. Adding this functionality has not been a priority, because my colleagues and I almost never deal with such large files.

colindm commented 9 months ago

OK, thanks for the response though I appreciate it.