Prototype async function with error reporting

springmeyer commented 10 years ago

My sense is that async functions can wait: once sync bindings are in place with solid tests and some releases are out in the wild then selectively exposing some critical functions as async would be worthwhile.

But we should figure out now how we think async functions will work. In particular error reporting will be important to discuss. This came up in the comment stream in #13.

The questions:

Can CPLSetErrorHandler be set, used, and unset safely per async function: so is is safe to use while in the libuv threadpool?
Can CPLSetThreadLocalConfigOption be used similarly to allow GDAL config options to be modified per async function?
If the answers to the above issues are no, then what options do we have for catching and reporting meaningful errors when async functions hit GDAL errors - Is this a case where the C++ API instead of the C API should be used?

max-mapper commented 9 years ago

Hey just a clarifying question: is the current state of node-gdal that everything is sync (blocks the event loop)? If so it would be nice to mention it in the readme, cheers!

brianreavis commented 9 years ago

@maxogden Yeah – right now it's all blocking, unfortunately. Excellent point about noting it in the readme.. I'll do that now

springmeyer commented 9 years ago

related (in terms of ability to catch errors): http://lists.osgeo.org/pipermail/gdal-dev/2011-February/027837.html

jdesboeufs commented 9 years ago

I have prototyped two methods: firstAsync() and nextAsync(). But I'm not able to write a benchmark test to evaluate the benefit of the async way. It seems these operations are too fast in context of file parsing.

What could be the best methods to implement in that way?

jdesboeufs commented 9 years ago

Ok there is no benefit for local and small files on an idle system, but in production and for remote services and databases it's great!

Step 1: bump to nan 2.0 Step 2 : clean exposed C++ API Step 3 : improve exposed JS API (sync, async, streams ...)

korczis commented 8 years ago

Any news?

jdesboeufs commented 8 years ago

I'll look at this. Any idea for the JS API?

jdesboeufs commented 8 years ago

I can work on this, but I need help. We have to investigate for each method or accessor if there is IO or massive computation behind.

Maybe we can start with an experimental branch, focused on vector datasets?

brianreavis commented 8 years ago

I think adding async will probably be a major version bump so I think we can get pretty aggressive with the API for it. In general, it'd probably be good to follow the node model of:

gdal.someMethod(args, callback)
gdal.someMethodSync(args)

As for what's fine to leave sync forever / what should be async... that's a good question. If you run make test it'll use the list reporter that shows the time each test took. Some good methods to start with seem to be:

gdal.reprojectImage(opts) ->
   gdal.reprojectImageSync(opts)
   gdal.reprojectImage(opts, callback)

band.getStatistics(allow_approximation, force) ->
   band.computeStatisticsSync(opts)
   band.computeStatistics(opts, callback)

band.computeStatistics(allow_approximation) ->
   band.computeStatisticsSync(opts)
   band.computeStatistics(opts, callback)

A few notes though:

The tests don't really test large datasets or common patterns so I'm sure there's a lot in the tests that is fast but should be async.
I have a fear that basically all of node-gdal will need to turn async, especially given that some gdal drivers do network IO (e.g. hopefully #132 in the future). I think the async task in general will take really digging into gdal to figure out when and where it reads from disk/network and when it operates out of memory.
Another big consideration: https://trac.osgeo.org/gdal/wiki/FAQMiscellaneous#IstheGDALlibrarythread-safe

I'd be interested in hearing how people want to use node-gdal with async. Also: of those cases, which can't or shouldn't be solved by offloading the processing/io to another process via node-worker-farm?

max-mapper commented 8 years ago

The general idea of native async addons is to be able to use them in a networked environment where latency matters. For example if you have 100 users doing 100 simultaneous gdal operations. You want to avoid blocking the event loop for extended periods of time and instead process things in increments, deferring to the event loop in-between.

Another important thing is to expose backpressure from any I/O operations. This is very important for networked applications for memory usage. If you have 100 users, 99 of which are on fiber internet and 1 of which is tethered on a 3G mobile hotspot, you will want to write data to that 1 slow socket slower than the 99 fast sockets. If you write data to the socket without slowing down (e.g. no backpressure) then you end up allocating too much memory because you can't flush the socket as fast as you are writing to it. This can be a problem with 1 slow connection, but it gets much worse with many slow connections.

brianreavis commented 8 years ago

@maxogden Yeah, that's the principle. Which point are you addressing?

jdesboeufs commented 8 years ago

I don't know if GDAL2 is thread-safe, but we can ask @rouault. He also work in Paris area.

Other alternatives to native async addons: https://github.com/nodejs/node/pull/2133 https://github.com/audreyt/node-webworker-threads

For my developments with French gov we still use ogr2ogr in production, that we run through child_process.spawn. Our only use case for now is on-the-fly conversion of vector dataset (format+crs).

The next step will be on-the-fly data loading, in a programmatic way. We have a small prototype which use node-gdal. It was my starting point to experiment an async adaptation.

rouault commented 8 years ago

I don't know if GDAL2 is thread-safe, but we can ask @rouault.

I've updated https://trac.osgeo.org/gdal/wiki/FAQMiscellaneous#IstheGDALlibrarythread-safe to reflect the current state of things.

jdesboeufs commented 8 years ago

Hi Even, Thank you for the update!

We plan to implement an async interface in this JS binding. But we have to find methods or accessors which handle heavy computation or IO access. Is there a general rule for that or maybe we could organize a sprint together with French gov sponsorship? Tell me.

rouault commented 8 years ago

But we have to find methods or accessors which handle heavy computation or IO access.

I guess that mostly GDALRasterIO() / GDALDatasetRasterIO() make sense as async method (and other methods perhaps that get statistics or minimum/maximum values). Basically the thumb rule is that all methods are fast, except the ones that deal with pixels. Dataset opening could potentially be slow in a few rare cases, but that's mostly for esoteric drivers (XYZ driver for example) and I wouldn't care about that.

jdesboeufs commented 8 years ago

So can we say that:

properties and get(int n) accessors are pre-computed?
methods as srsName() are computed on-the-fly?

We need to know, for example in this context, what to implement in async/sync mode.

rouault commented 8 years ago

Ah you are talking about the OGR side. Well GetFeatureCount() and GetExtent() can be slow if bForce = TRUE. Depending on drivers, methods that require the feature definition to be established (getting the fields, geometry type, feature iteration) might also be slow if full file inspection is needed

jdesboeufs commented 8 years ago

Ok so implementing both async/sync could be the smarter strategy... Thank you for your answers @rouault, I hope we will see you next months ;)

tjwebb commented 8 years ago

Any movement on this?

brianreavis commented 8 years ago

@tjwebb Unfortunately, no. @brandonreavis and I have been swamped and it's a big undertaking that we haven't had much need for. I think @springmeyer and company are in the same boat. What's your use case? Is it something you can run in another process?

grEvenX commented 7 years ago

I would also be interested in this 👍 Currently we use worker-farm as a solution/work-around for now. We use this library to get elevation in meters for any given location.

jdesboeufs commented 7 years ago

I just copy/paste this interesting resource: https://nodeaddons.com/

I bought the book.

springmeyer commented 7 years ago

it's a big undertaking that we haven't had much need for. I think @springmeyer and company are in the same boat.

Yes, that's correct. We've not had a need for truly async functions in node-gdal. We use node-gdal only for backend and batch data processing where sync functions are sufficient (if not ideal). We don't use node-gdal for low latency situations or any user facing servers.

At Mapbox however we do write quite a bit of C++ in node addons destined for low latency, user facing APIs. Anyone interested in this the https://nodeaddons.com book looks good that @jdesboeufs linked to. We also maintain a living example project that demonstrates best practices for highly efficient node C++ async functions over at https://github.com/mapbox/node-cpp-skel.

In general they key to efficient node C++ addon async functions is being able to pass state to the Node C++ threadpool without needing to lock or otherwise synchronize data structures to make it threadsafe. Because GDAL uses threads itself, and has global data structures, I've never dared/been optimistic that making it work in the Node C++ threadpool would be easy, stable, or worth it. Certainly some functions in GDAL may be safe to use, but I've not investigated in depth.

So, from my perspective - and as the original creater of this ticket - I would be okay with closing this as not actionable/a big lift that may not be needed. The worker-farm solution feels like a good one - it moves GDAL instances into individual processes - which will be robust and avoid any thread safety issues. The challenge with async functions is that they are done via threads and therefore share the memory space and thread locks with all code in that process.

naturalatlas / node-gdal

Prototype async function with error reporting #18