omniscale / imposm3

Imposm imports OpenStreetMap data into PostGIS
http://imposm.org/docs/imposm3/latest/
Apache License 2.0
719 stars 157 forks source link

Memory Leak #139

Closed lukasmartinelli closed 7 years ago

lukasmartinelli commented 7 years ago

Context

The latest imposm3 build consumes all the memory on the system until it get's killed (at the import step - not read).

Expected Behavior

Memory usage should not grow linearly over time.

Actual Behavior

All memory consumed (doesn't matter whether 32GB or 54GB of memory) and Linux needs to kill the process.

Possible Fix

Steps to Reproduce

Importing with the prebuilt imposm3 https://imposm.org/static/rel/imposm3-0.3.0dev-20170119-353bc5d-linux-x86-64.tar.gz and the planet file from last week http://planet.osm.org/pbf/planet-170116.osm.pbf ends consuming all the memory there is in the system until Linux needs to kill it. I was able to reproduce this 3 times.

Killed process 1099 (imposm3) total-vm:38499228kB, anon-rss:26744968kB, file-rss:0kB

Your Environment

lukasmartinelli commented 7 years ago

The last working version is https://imposm.org/static/rel/imposm3-0.3.0dev-20161206-e67cf93-linux-x86-64.tar.gz.

Both last two versions https://imposm.org/static/rel/imposm3-0.3.0dev-20161216-f4ccff0-linux-x86-64.tar.gz and https://imposm.org/static/rel/imposm3-0.3.0dev-20170119-353bc5d-linux-x86-64.tar.gz cause the process to crash due to memory consumption.

olt commented 7 years ago

Do you have some more details? How long does it take to crash? When does it crash (console output)? Does it crash with other files?

I looked at the diffs between e67cf93 and f4ccff0 and did found any obvious change that could result in a memory leak.

lukasmartinelli commented 7 years ago

Does it crash with other files?

I also tried it with the planet dump from the previous week - which has the same problems.

Do you have some more details? How long does it take to crash? When does it crash (console output)?

The program doesn't panic - in the end it just allocates so much memory that Linux needs to kill it with the KILL signal (usually it consumed around 4-5GB when writing).

It happens when imposm3 is writing the data to PostGIS (-write) after approx 5-6 hours of writing.

Have you done a full planet import on your production setups since december?

One other variable that I've not tested whether it has an impact are different PostgreSQL versions. The only way to get more information is to throw the Go profiler at it and capture a memory profile for further analysis.

olt commented 7 years ago

It is still importing ways when it runs out of memory? Do you import with `-diff``?

I only imported Europe extracts with these versions and didn't noticed any issues.

You can start an import with -httpprofile 0.0.0.0:6060 and attach to the profiler with go tool pprof -web http://server:6060/debug/pprof/heap. This only shows the memory of the Go part though. The internal cache memory and write buffers of LevelDB are not shown.

stirringhalo commented 7 years ago

@lukasmartinelli Did you ever resolve this? I had issues with 55GB (where IIRC I needed an extra 30GB of swap) but with 128GB of RAM, no problem. But that was just using the regular docker containers without the dev imposm3.

olt commented 7 years ago

I did a test with an Europe extract (I don't have a faster test server at hand for a planet import) and it topped at 5GB RSS. Planet is twice the size of a Europe extract, so it doesn't look like I can reproduce this exact issue.

The memory usage by Go was constantly below 100MB. The difference is in use by GEOS or LevelDB. Interestingly, the RSS dropped to 100MB after the -write was complete and Imposm was creating indices.

dwdc commented 7 years ago

@olt I encountered same problem as @stirringhalo and @lukasmartinelli identified. Here's some more logs for references:

Setup: cpu: i7-2600 ram: 32GB DDR3 OS: centos 7.3

libraries tried:

  1. Go 1.6.3 + GEOS 3.5.0 + leveldb + postgis 2.2 + PostgreSQL 9.6
  2. Go 1.6.3 + GEOS 3.6.1 + hyperleveldb + postgis 2.3.1 + PostgreSQL 9.6

No matter I load Europe.osm.pbf or planet one, I got sigkill from OS. There is always a growth in RSS to 24GB range before sigkill.


**$> ps -ef | grep imposm (before crash)**
root     14014  379 74.9 29131596 24532212 tty6 Sl+ 08:22 408:20 imposm3 import -cachedir cache -mapping mapping/mapping.yaml -write -connection postgis://osm:osm@localhost/osm -optimize
root     14014  378 74.9 28984640 24538464 tty6 Sl+ 08:22 408:32 imposm3 import -cachedir cache -mapping mapping/mapping.yaml -write -connection postgis://osm:osm@localhost/osm -optimize
root     14014  378 74.9 29135324 24544424 tty6 Dl+ 08:22 408:44 imposm3 import -cachedir cache -mapping mapping/mapping.yaml -write -connection postgis://osm:osm@localhost/osm -optimize
root     14014  377 74.9 29196104 24549100 tty6 Sl+ 08:22 408:56 imposm3 import -cachedir cache -mapping mapping/mapping.yaml -write -connection postgis://osm:osm@localhost/osm -optimize
root     14014  377 74.9 29133604 24549116 tty6 Sl+ 08:22 409:08 imposm3 import -cachedir cache -mapping mapping/mapping.yaml -write -connection postgis://osm:osm@localhost/osm -optimize
root     14014  376 74.9 29149664 24550068 tty6 Dl+ 08:22 409:20 imposm3 import -cachedir cache -mapping mapping/mapping.yaml -write -connection postgis://osm:osm@localhost/osm -optimize
root     14014  376 74.9 29137068 24550960 tty6 Sl+ 08:22 409:33 imposm3 import -cachedir cache -mapping mapping/mapping.yaml -write -connection postgis://osm:osm@localhost/osm -optimize
root     14014  375 74.9 29167608 24555628 tty6 Sl+ 08:22 409:50 imposm3 import -cachedir cache -mapping mapping/mapping.yaml -write -connection postgis://osm:osm@localhost/osm -optimize
root     14014  375 74.9 29118164 24558148 tty6 Sl+ 08:22 410:02 imposm3 import -cachedir cache -mapping mapping/mapping.yaml -write -connection postgis://osm:osm@localhost/osm -optimize
root     14014  374 74.9 29152312 24559864 tty6 Dl+ 08:22 410:22 imposm3 import -cachedir cache -mapping mapping/mapping.yaml -write -connection postgis://osm:osm@localhost/osm -optimize

I also captured a perf report indicating strong swap behavior due to imposm3 for reference (excrept here):

Samples: 2M of event 'cycles:ppp', Event count (approx.): 6759045766152
  Children      Self  Command          Shared Object            Symbol
+   35.43%     0.07%  swapper          [kernel.kallsyms]        [k] cpu_startup_entry
+   34.74%     0.01%  swapper          [kernel.kallsyms]        [k] arch_cpu_idle
+   34.72%     0.05%  swapper          [kernel.kallsyms]        [k] cpuidle_idle_call
+   32.93%     0.14%  swapper          [kernel.kallsyms]        [k] cpuidle_enter_state
+   32.26%    32.26%  swapper          [kernel.kallsyms]        [k] intel_idle
+   31.06%     0.00%  swapper          [kernel.kallsyms]        [k] start_secondary
+    6.62%     0.00%  imposm3          [unknown]                [.] 0000000000000000
+    4.45%     0.01%  imposm3          [kernel.kallsyms]        [k] system_call_fastpath
+    4.44%     0.00%  swapper          [kernel.kallsyms]        [k] rest_init
+    4.44%     0.00%  swapper          [kernel.kallsyms]        [k] start_kernel
+    4.44%     0.00%  swapper          [kernel.kallsyms]        [k] x86_64_start_reservations
+    4.44%     0.00%  swapper          [kernel.kallsyms]        [k] x86_64_start_kernel
+    2.49%     0.00%  imposm3          [unknown]                [.] 0x00002825048b4864
+    2.34%     0.01%  imposm3          [kernel.kallsyms]        [k] page_fault
+    2.33%     0.02%  imposm3          [kernel.kallsyms]        [k] do_page_fault
+    2.29%     0.06%  imposm3          [kernel.kallsyms]        [k] __do_page_fault
+    2.25%     0.00%  imposm3          [unknown]                [.] 0x00007f135eecf0d0
+    2.25%     0.00%  imposm3          libstdc++.so.6.0.19      [.] std::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >::~basic_ostringstream
+    2.17%     1.10%  imposm3          libhyperleveldb.so       [.] leveldb::Block::Iter::Seek
+    1.91%     0.09%  imposm3          [kernel.kallsyms]        [k] handle_mm_fault
+    1.82%     1.75%  imposm3          imposm3                  [.] runtime.mallocgc
+    1.69%     0.01%  imposm3          libc-2.17.so             [.] __GI___munmap
+    1.66%     0.02%  imposm3          [kernel.kallsyms]        [k] do_read_fault.isra.42
+    1.45%     1.30%  imposm3          libstdc++.so.6.0.19      [.] std::__ostream_insert<char, std::char_traits<char> >
+    1.44%     1.41%  imposm3          imposm3                  [.] runtime.scanobject
+    1.40%     0.00%  imposm3          [kernel.kallsyms]        [k] sys_munmap
+    1.39%     0.15%  imposm3          libhyperleveldb.so       [.] leveldb::ReadBlock
+    1.39%     0.00%  imposm3          [kernel.kallsyms]        [k] vm_munmap
+    1.38%     1.31%  imposm3          imposm3                  [.] runtime.findrunnable
+    1.37%     0.01%  imposm3          [kernel.kallsyms]        [k] __do_fault
+    1.35%     0.02%  imposm3          [kernel.kallsyms]        [k] xfs_filemap_fault
     1.32%     1.32%  imposm3          [kernel.kallsyms]        [k] nmi
+    1.32%     0.17%  imposm3          imposm3                  [.] runtime.asmcgocall
+    1.26%     0.02%  imposm3          [kernel.kallsyms]        [k] do_munmap
+    1.26%     0.05%  imposm3          [kernel.kallsyms]        [k] filemap_fault
+    1.14%     0.01%  imposm3          [kernel.kallsyms]        [k] unmap_region
+    1.07%     1.04%  imposm3          libc-2.17.so             [.] _int_malloc
+    1.06%     0.02%  imposm3          imposm3                  [.] runtime.futex
+    1.00%     0.99%  imposm3          imposm3                  [.] runtime.procyield
+    0.96%     0.00%  imposm3          [unknown]                [.] 0x0000000000000025

Will try to downgrade PostgreSQL and other libraries and see if it works.

olt commented 7 years ago

It's really hard to debug this without the exact commands and mapping files.

@lukasmartinelli @stirringhalo @dwdc Are you using webmerc_area? (Can you add a thumbs up reaction if you do, and thumbs down if not?)

dwdc commented 7 years ago

@olt Here's the command I used to run imposm.

imposm3 import -cachedir cache -mapping mapping/mapping.yaml -write -connection postgis://osm:osm@localhost/osm -optimize

Mapping file is built out of this package: https://github.com/openmaptiles/openmaptiles

Can't upload the file now, will do so when I return to my PC.

Edit: I tried to disable OOM killer today and performed a test. PostGIS always raised an OOM exception in query COPY "import"."osm_aeroway_polygon" ("osm_id", "geometry", "aeroway", "area") from STDIN, no matter how I tune postgresql.conf. Would this be an issue with bulk load sizing?

Furthermore, the lesser the shared_buffer in PostgreSQL, the faster the error encountered.

Attached mapping.yaml

`generalized_tables: aeroway_polygon_gen1: {source: aeroway_polygon, sql_filter: area>60000, tolerance: 20.0} aeroway_polygon_gen2: {source: aeroway_polygon_gen1, sql_filter: area>240000, tolerance: 50.0} boundary_linestring_gen1: {source: boundary_linestring, tolerance: 30.0} boundary_linestring_gen2: {source: boundary_linestring, tolerance: 60.0} boundary_linestring_gen3: {source: boundary_linestring, tolerance: 120.0} boundary_linestring_gen4: {source: boundary_linestring, tolerance: 240.0} boundary_linestring_gen5: {source: boundary_linestring, tolerance: 480.0} building_polygon_gen1: {source: building_polygon, sql_filter: area>1400.0, tolerance: 10.0} highway_linestring_gen1: {source: highway_linestring, sql_filter: 'highway IN (''motorway'',''trunk'', ''primary'', ''secondary'', ''tertiary'') AND NOT is_area', tolerance: 20.0} highway_linestring_gen2: {source: highway_linestring_gen1, sql_filter: 'highway IN (''motorway'',''trunk'', ''primary'', ''secondary'') AND NOT is_area', tolerance: 50.0} highway_linestring_gen3: {source: highway_linestring_gen2, sql_filter: 'highway IN (''motorway'',''trunk'', ''primary'') AND NOT is_area', tolerance: 120.0} highway_linestring_gen4: {source: highway_linestring_gen3, sql_filter: 'highway IN (''motorway'',''trunk'') AND NOT is_area', tolerance: 200.0} landcover_polygon_gen1: {source: landcover_polygon, sql_filter: area>60000, tolerance: 20.0} landcover_polygon_gen2: {source: landcover_polygon_gen1, sql_filter: area>240000, tolerance: 50.0} landcover_polygon_gen3: {source: landcover_polygon_gen2, sql_filter: area>480000, tolerance: 80.0} landcover_polygon_gen4: {source: landcover_polygon_gen3, sql_filter: area>1200000, tolerance: 120.0} landcover_polygon_gen5: {source: landcover_polygon_gen4, sql_filter: area>4200000, tolerance: 200.0} landcover_polygon_gen6: {source: landcover_polygon_gen5, sql_filter: area>15000000, tolerance: 300.0} landuse_polygon_gen1: {source: landuse_polygon, sql_filter: area>60000, tolerance: 20.0} landuse_polygon_gen2: {source: landuse_polygon_gen1, sql_filter: area>240000, tolerance: 40.0} landuse_polygon_gen3: {source: landuse_polygon_gen2, sql_filter: area>960000, tolerance: 80.0} landuse_polygon_gen4: {source: landuse_polygon_gen3, sql_filter: area>2000000, tolerance: 160.0} park_polygon_gen1: {source: park_polygon, sql_filter: area>60000, tolerance: 20.0} park_polygon_gen2: {source: park_polygon_gen1, sql_filter: area>240000, tolerance: 50.0} park_polygon_gen3: {source: park_polygon_gen2, sql_filter: area>480000, tolerance: 80.0} park_polygon_gen4: {source: park_polygon_gen3, sql_filter: area>1200000, tolerance: 120.0} park_polygon_gen5: {source: park_polygon_gen4, sql_filter: area>4200000, tolerance: 200.0} park_polygon_gen6: {source: park_polygon_gen5, sql_filter: area>15000000, tolerance: 300.0} park_polygon_gen7: {source: park_polygon_gen6, sql_filter: area>60000000, tolerance: 400.0} park_polygon_gen8: {source: park_polygon_gen6, sql_filter: area>240000000, tolerance: 600.0} railway_linestring_gen1: {source: railway_linestring, sql_filter: railway='rail' AND service='', tolerance: 20.0} railway_linestring_gen2: {source: railway_linestring_gen1, tolerance: 40.0} water_polygon_gen1: {source: water_polygon, sql_filter: area>500000, tolerance: 20.0} water_polygon_gen2: {source: water_polygon_gen1, sql_filter: area>1000000, tolerance: 40.0} water_polygon_gen3: {source: water_polygon_gen2, sql_filter: area>2500000.0, tolerance: 80.0} water_polygon_gen4: {source: water_polygon_gen3, sql_filter: area>5000000.0, tolerance: 160.0} water_polygon_gen5: {source: water_polygon_gen4, sql_filter: area>10000000.0, tolerance: 320.0} waterway_linestring_gen1: {source: waterway_linestring, sql_filter: waterway IN ('river'), tolerance: 50.0} waterway_linestring_gen2: {source: waterway_linestring_gen1, sql_filter: waterway IN ('river'), tolerance: 100.0} waterway_linestring_gen3: {source: waterway_linestring_gen2, sql_filter: waterway IN ('river'), tolerance: 200.0} tables: aeroway_polygon: fields:

dwdc commented 7 years ago

By replacing the webmerc_area to pseudoarea in the mapping.yaml, imposm works perfectly... with even less memory. Seems the culprit is in there. Suggest people with same problem replacing the type as workaround.

Is it possibly related? https://github.com/omniscale/imposm3/blob/master/mapping/fields.go#L238

olt commented 7 years ago

Ok, I found a leak in Geom.Bounds. This was triggered in the webmerc_area field and in the -limitto code. My testing with a Europe extract didn't use these options.

Lesson learned: Always insist on a complete list with all steps to reproduce a bug, including configurations.