mmomtchev / node-gdal-async

Node.js bindings for GDAL (Geospatial Data Abstraction Library) with full async support
https://mmomtchev.github.io/node-gdal-async/
Apache License 2.0
129 stars 26 forks source link

Tracking down a "double free or corruption (!prev)" core dump #89

Closed potion-cellar closed 11 months ago

potion-cellar commented 1 year ago

Hello,

We are operating a data pipeline that uses gdal-async and parses thousands of datasets a day, performing translate, warp, and calc operations. We've noticed in the past few months (likely not related to any updates to the libraries, but rather a change in some data providers) that every 8-24 hours we get an uncatchable core dump:

double free or corruption (!prev)

This of course terminates the node event loop and wreaks all sorts of havoc (notably database connections and ORM clients being left hanging) but the big question is where it is coming from (almost surely gdal bindings) and how we are going to debug it (since we can't catch the error).

Before I start throwing logging around hundreds of calls to gdal to try to pinpoint where this may be occurring, was wondering if you had any insights or a good place to start to be a little more targeted with the debugging efforts.

This issue has occurred with both the 3.6.x bindings (Node 18) and the new 3.7.0 bindings (Node 20). Ubuntu 22.04.

We are using the asynchronous methods for just about everything, with pretty thorough error handling.

mmomtchev commented 1 year ago

Obviously impossible to tell, could be everywhere, the bindings C++ code, GDAL, its dependencies. The module supports being built with ASAN with --enable-asan. You won't be able to use this in production, but you can probably use it to run your chain once to see if it is something that always happens. Your next step should be to compile the module in debug mode and to enable core dumps. Maybe I will be able to do something with the dump.

potion-cellar commented 1 year ago

@mmomtchev Thanks. I'll see what I'm able to find.

mmomtchev commented 1 year ago

Probably a duplicate of #90

mmomtchev commented 11 months ago

Reopen this issue if you still get it with 3.7.2