mapnik / node-mapnik

Bindings to mapnik for node.js
http://mapnik.org/documentation/node-mapnik
BSD 3-Clause "New" or "Revised" License
533 stars 165 forks source link

SIGSEGV with 4.5.x #978

Closed yohanboniface closed 3 years ago

yohanboniface commented 3 years ago

Hi,

I apologize in advance, as this bug report will not be very actionable I'm afraid :(

I've been reported SIGSEGV with kosmtik, and I've been able to reproduce it, but I've not yet been able to produce a minimal test case out of kosmtik.

The issue arise quite often, but with some randomness from the user point of view.

Seems to be related to writing a png. When running kosmtik with gpb, here is the backtrace:

(gdb) backtrace
#0  0x00007ffff7b94738 in __memmove_avx_unaligned_erms () from /usr/lib/libc.so.6
#1  0x00007fffdf72652d in png_write_row () from /home/ybon/Code/js/kosmtik/node_modules/mapnik/lib/binding/lib/libmapnik.so
#2  0x00007fffdf726dbd in png_write_png () from /home/ybon/Code/js/kosmtik/node_modules/mapnik/lib/binding/lib/libmapnik.so
#3  0x00007fffdeb167f1 in void mapnik::save_as_png<std::ostream, mapnik::image_view<mapnik::image<mapnik::rgba8_t> > >(std::ostream&, mapnik::image_view<mapnik::image<mapnik::rgba8_t> > const&, mapnik::png_options const&) () from /home/ybon/Code/js/kosmtik/node_modules/mapnik/lib/binding/lib/libmapnik.so
#4  0x00007fffdeb08d2e in void mapnik::png_saver::operator()<mapnik::image_view<mapnik::image<mapnik::rgba8_t> > >(mapnik::image_view<mapnik::image<mapnik::rgba8_t> > const&) const ()
   from /home/ybon/Code/js/kosmtik/node_modules/mapnik/lib/binding/lib/libmapnik.so
#5  0x00007fffdeac9504 in void mapnik::save_to_stream<mapnik::image_view_any>(mapnik::image_view_any const&, std::ostream&, std::string const&) ()
   from /home/ybon/Code/js/kosmtik/node_modules/mapnik/lib/binding/lib/libmapnik.so
#6  0x00007fffdeaca7c7 in std::string mapnik::save_to_string<mapnik::image_view_any>(mapnik::image_view_any const&, std::string const&) ()
   from /home/ybon/Code/js/kosmtik/node_modules/mapnik/lib/binding/lib/libmapnik.so
#7  0x00007ffff4d86cdd in Execute () at ../src/mapnik_image_view.cpp:282
#8  0x00007ffff4cf1b94 in OnExecute () at /home/travis/build/mapnik/node-mapnik/node_modules/node-addon-api/napi-inl.h:4265
#9  0x000000000136ecde in worker (arg=0x0) at ../deps/uv/src/threadpool.c:122
#10 0x00007ffff7c06259 in start_thread () from /usr/lib/libpthread.so.0
#11 0x00007ffff7b2f5e3 in clone () from /usr/lib/libc.so.6

I've been able to reproduce this issue with node-mapnik>=4.5.0 (including 4.5.8) running on both NodeJS==14.x and NodeJS==12.x. But I'm unable to reproduce with NodeJS 12.x and node-mapnik==4.4.0. I've not been able to test with NodeJS==14.x and node-mapnik==4.4.0, because the install fails (no binary available in the AWS).

So it seems that the issue appears between 4.4.0 and 4.5.0.

Any clue ? Any idea on how to debug that further ?

Thanks for your help :)

springmeyer commented 3 years ago

Hi @yohanboniface - nice to cross paths with you. From the backtrace it looks as if the memory that is being written to might have become invalid. That could happen if node-mapnik is being ask to encode data from an input buffer but that buffer has since gone out of scope and started to be garbage collected. If you want to share a link to the code in kosmtik that is using node-mapnik I can take a closer look and let you know if this hypothesis might be related.

springmeyer commented 3 years ago

Not sure if its your exact issue @yohanboniface but I've been able to reduce a crashing testcase that @artemp is going to take a look at soon that sounds a bit like this. It is:

var mapnik = require('mapnik');

function getData() {
 const im = mapnik.Image.open('<some PNG IMAGE>');
 return {data: im.data(), height: im.height(), width: im.width()}
}

function scopedFunction() {
  var { data, height, width } = getData();
  var actual = new mapnik.Image(width, height);
  // seems to crash with and without this, but I presume will crash more often with it 
  // but requires node being run with --expose-gc
  gc();
  new mapnik.Image.fromBufferSync(actual.width(), actual.height(), data).encode('png', (err, buffer) => {
    // uncomment this line to avoid crash
    //console.log(data.length)
    console.log('finished, may crash if data has gone out of scope by now');
  });
}

//scopedFunction(); 
for (let i=0;i<100;++i) {
  scopedFunction();  
}
yohanboniface commented 3 years ago

Yay @springmeyer ! Still on board ten years later :)

Thanks and congrats for the quick test case.

The Kosmtik failing code is between here:

https://github.com/kosmtik/kosmtik/blob/master/src/back/MetatileBasedTile.js

and here:

https://github.com/kosmtik/kosmtik/blob/master/src/back/ProjectServer.js#L72-L96

(Sorry for my old school JS :p)

springmeyer commented 3 years ago

Still have not forgot about my paris invite :) But perhaps, if my hypothesis is true, it means that https://github.com/mapnik/node-mapnik/issues/360 has regressed and I owe you a beer this time. @artemp is going to help here and will take a look more tomorrow and might have a new node-mapnik version for you to test.

yohanboniface commented 3 years ago

Still have not forgot about my paris invite :)

Ah ah, me neither, I thought about it yesterday! The invite is still valid, there is no expiration date ;)

might have a new node-mapnik version for you to test.

I'll do :)

artemp commented 3 years ago

@yohanboniface - I've identified the issue (hopefully the one you're after) and I'm working on PR. I'm hoping to have something to test tomorrow. Stay tuned :)

artemp commented 3 years ago

@yohanboniface https://www.npmjs.com/package/mapnik/v/4.5.9-dev - could you give it go? /cc @springmeyer

yohanboniface commented 3 years ago

@artemp sadly, it crashed again :(

#0  0x00007ffff7b94738 in __memmove_avx_unaligned_erms () from /usr/lib/libc.so.6
#1  0x00007fffdf6fcc7d in png_write_row () from /home/ybon/Code/js/kosmtik/node_modules/mapnik/lib/binding/lib/libmapnik.so
#2  0x00007fffdf6fd50d in png_write_png () from /home/ybon/Code/js/kosmtik/node_modules/mapnik/lib/binding/lib/libmapnik.so
#3  0x00007fffdea8e063 in void mapnik::save_as_png<std::ostream, mapnik::image_view<mapnik::image<mapnik::rgba8_t> > >(std::ostream&, mapnik::image_view<mapnik::image<mapnik::rgba8_t> > const&, mapnik::png_options const&) () from /home/ybon/Code/js/kosmtik/node_modules/mapnik/lib/binding/lib/libmapnik.so
#4  0x00007fffdea7ff34 in void mapnik::png_saver::operator()<mapnik::image_view<mapnik::image<mapnik::rgba8_t> > >(mapnik::image_view<mapnik::image<mapnik::rgba8_t> > const&) const ()
   from /home/ybon/Code/js/kosmtik/node_modules/mapnik/lib/binding/lib/libmapnik.so
#5  0x00007fffdea468dd in void mapnik::save_to_stream<mapnik::image_view_any>(mapnik::image_view_any const&, std::ostream&, std::string const&) ()
   from /home/ybon/Code/js/kosmtik/node_modules/mapnik/lib/binding/lib/libmapnik.so
#6  0x00007fffdea47d07 in std::string mapnik::save_to_string<mapnik::image_view_any>(mapnik::image_view_any const&, std::string const&) ()
   from /home/ybon/Code/js/kosmtik/node_modules/mapnik/lib/binding/lib/libmapnik.so
#7  0x00007ffff4dbbd7d in Execute () at ../src/mapnik_image_view.cpp:285
#8  0x00007ffff4d29ef4 in OnExecute () at /home/travis/build/mapnik/node-mapnik/node_modules/node-addon-api/napi-inl.h:4361
#9  0x000000000136ecde in worker (arg=0x0) at ../deps/uv/src/threadpool.c:122
#10 0x00007ffff7c06259 in start_thread () from /usr/lib/libpthread.so.0
#11 0x00007ffff7b2f5e3 in clone () from /usr/lib/libc.so.6

Install log:

leonardo:~/C/j/kosmtik (master !$=) npm install                                                                                                                                          695ms

> mapnik@4.5.9-dev install /home/ybon/Code/js/kosmtik/node_modules/mapnik
> node-pre-gyp install --fallback-to-build

[mapnik] Success: "/home/ybon/Code/js/kosmtik/node_modules/mapnik/lib/binding/mapnik.node" is installed via remote
npm WARN optional SKIPPING OPTIONAL DEPENDENCY: fsevents@2.1.3 (node_modules/fsevents):
npm WARN notsup SKIPPING OPTIONAL DEPENDENCY: Unsupported platform for fsevents@2.1.3: wanted {"os":"darwin","arch":"any"} (current: {"os":"linux","arch":"x64"})

added 780 packages from 1049 contributors and audited 782 packages in 48.952s

25 packages are looking for funding
  run `npm fund` for details

found 45 vulnerabilities (29 moderate, 14 high, 2 critical)
  run `npm audit fix` to fix them, or `npm audit` for details

Thanks anyway for the quick answer! What could I do to help debuging the issue ?

artemp commented 3 years ago

@yohanboniface must be something else, thanks for testing! Ideally we need a minimal test case. I'll take a look again and let you know, cheers!

artemp commented 3 years ago

@yohanboniface I'd appreciate your assistance to debug this issue, could you create a minimal JS script to trigger this issue or at least tell me exact steps I can follow to replicate locally with kosmtik ?

artemp commented 3 years ago

@yohanboniface Let's try couple things to narrow this issue down:

diff --git a/src/back/Tile.js b/src/back/Tile.js
index 2a890f3..2a4b9d1 100644
--- a/src/back/Tile.js
+++ b/src/back/Tile.js
@@ -30,6 +30,7 @@ class Tile {
     this.setupBounds();
     map.zoomToBox(this.projection.forward([this.minX, this.minY, this.maxX, this.maxY]));
     var im = new mapnik.Image(this.height, this.width);
+    cb.bind(im);
     map.render(im, {scale: this.mapScale || 1, variables: {zoom: this.z}}, cb);
   };

Could you try running kosmtik with the patch ^^ and see if issue persists ?

yohanboniface commented 3 years ago

could you create a minimal JS script to trigger this issue or at least tell me exact steps I can follow to replicate locally with kosmtik ?

I'll try to do both, hopefully we'll make it!

Could you try running kosmtik with the patch ^^ and see if issue persists ?

This patch is for Tile.js, which is the class used when no metatile is to be used (when project.yml contains metatile: 1, vs metatile: 4 for example). And in this case, without metatile, I cannot reproduce the issue (I did not test that before). So the issue is specifically related to metatile based process, which is mainly in MetatileBasedTile.js. If you have a suggestion of patch to be made on this file, I'll try it of course.

In the meantime, I'll try to either find a way to create a minimal test case or to make it easy for you to reproduce within kosmtik.

artemp commented 3 years ago

could you create a minimal JS script to trigger this issue or at least tell me exact steps I can follow to replicate locally with kosmtik ?

I'll try to do both, hopefully we'll make it!

Could you try running kosmtik with the patch ^^ and see if issue persists ?

This patch is for Tile.js, which is the class used when no metatile is to be used (when project.yml contains metatile: 1, vs metatile: 4 for example). And in this case, without metatile, I cannot reproduce the issue (I did not test that before). So the issue is specifically related to metatile based process, which is mainly in MetatileBasedTile.js. If you have a suggestion of patch to be made on this file, I'll try it of course.

In the meantime, I'll try to either find a way to create a minimal test case or to make it easy for you to reproduce within kosmtik.

:+1:

artemp commented 3 years ago

@yohanboniface I managed to replicate by:

diff --git a/test/data/world/project.yml b/test/data/world/project.yml
index e63ec9c..fbef4e3 100644
--- a/test/data/world/project.yml
+++ b/test/data/world/project.yml
@@ -1,5 +1,5 @@
 scale: 1
-metatile: 1
+metatile: 4
 name: "The World"
 bounds:
   - -180

and just running and panning/zooming for a bit

lldb-12 -- node index.js serve ./test/data/world/project.yml

/cc @springmeyer

artemp commented 3 years ago

@yohanboniface Here is another patch, it solves the issue for me locally

diff --git a/src/back/ProjectServer.js b/src/back/ProjectServer.js
index 4ba8293..c906831 100644
--- a/src/back/ProjectServer.js
+++ b/src/back/ProjectServer.js
@@ -84,13 +84,14 @@ class ProjectServer {
           var tile = new tileClass(z, x, y, {size: size, metatile: self.project.metatile(), mapScale: mapScale});
           return tile.render(self.project, map, function (err, im) {
             if (err) return self.raise(err.message, res, release);
-                im.encode('png', function (err, buffer) {
+            im.encode('png', (function (err, buffer) {
               if (err) return self.raise(err.message, res, release);
               res.writeHead(200, {'Content-Type': 'image/png', 'Content-Length': buffer.length});
               res.write(buffer);
               res.end();
               release();
-                });
+              console.log("ImageView.encode");
+            }).bind(im));
           });
         });
     };
yohanboniface commented 3 years ago

Yay! Fixed here too!

Thanks a lot! And sorry that you had to investigate a kosmtik bug :(

I'd never have found it. I'm always a bit uncertain about this usage in JS ( I must say that I'm not doing any javascript/nodejs since years, so it's a bit absurd that I still maintain kosmtik :/ …)

springmeyer commented 3 years ago

Great to know that .bind(im) fixes things @yohanboniface - thanks for testing! However, I will re-open this issue because we consider that a workaround for the underlying bug (in node-mapnik) and have another fix in mind (now that you've proven what the problem is, given that fixes things for you...)

artemp commented 3 years ago

@yohanboniface - Could you try https://www.npmjs.com/package/mapnik/v/4.5.9-dev2 and see if this issue has been fixed (NOTE: this would require removing .bind(m) ? /cc @springmeyer

yohanboniface commented 3 years ago

@artemp sounds good! I've not been able to crash the app again. I confirm that I first removed the bind fix, then rm -rf node_modules then npm install with the dev2 version. Thanks :)

artemp commented 3 years ago

@yohanboniface - v4.5.9 is out, thanks for testing! /cc @springmeyer