libgeos / geos

Geometry Engine, Open Source
https://libgeos.org
GNU Lesser General Public License v2.1
1.21k stars 362 forks source link

Allow passing custom memory management functions #704

Open caspervdw opened 2 years ago

caspervdw commented 2 years ago

Providing custom memory management functions would allow dependent libraries to use their memory management policies. Especially when small amount of bytes are repeatedly allocated/deallocated, caching/pooling may show a decent speedup.

Also out-of-memory situations may be handled more gracefully (specifically for pygeos/shapely, we could throw an error instead of going in to swap, see https://github.com/shapely/shapely/issues/1349)

Ref. https://trac.osgeo.org/geos/ticket/540

One way to set up such a C interface is well described in the Numpy docs (NEP49): https://numpy.org/neps/nep-0049.html and another one in the Python docs: https://docs.python.org/3/c-api/memory.html#memoryoverview

@jorisvandenbossche @hobu

pramsey commented 2 years ago

Probably some benefit to this in PostGIS too, getting inside the MemoryContext stucture means things like configured memory upper limits can be enforced, and small allocations (should) be a little faster.

caspervdw commented 2 years ago

I just realised that the Python allocator will require the GIL (global interpreter lock) to be held. I missed that detail when scanning the docs earlier.

So for the shapely usecase this will probably be a reason to keep using the system allocator. Or am I missing something @jorisvandenbossche

jorisvandenbossche commented 2 years ago

Yes, that indeed seems a reason we wouldn't want to set up GEOS with Python's memory allocator in Shapely. There is also a Raw Memory Interface that does not require the GIL, but it seems this is just a plain wrapper around malloc et al, so using this wouldn't give any advantage (it's rather meant for overriding this in Python with your custom allocator).

That said, I think this would still be an interesting enhancement to explore. Whether it is for external users of GEOS to plug in their own allocator (like PostGIS might do), or whether it is to experiment with different memory allocators in GEOS itself. (I would assume that the leg work for both use cases would largely overlap?)

For example, in the Arrow C++ project, we have a configurable MemoryPool (header), and have implementations based on the system allocator, jemalloc and mimalloc. But we default to jemalloc / mimalloc (depending on the OS) because those give better performance (if enabled at build time).

rouault commented 1 year ago

In the GEOS context, a custom memory allocator would require all references to C++ containers to be changed from std::vector<T> to std::vector<T, my_allocator> (cf https://stackoverflow.com/a/826635). For example the std::vector<Coordinate> vect member of the CoordinateArraySequence class. This could be a bit of an invasive change.

In Arrow C++, as far as I can see, the MemoryPool infrastructure is only used for "big" allocations (that is for the content of Arrow arrays), that are done manually through it, but I don't see it used for standard C++ containers (probably because they don't consume a lot of memory)

Maxxen commented 1 year ago

I would be seriously interested in attempting to pick this up. Are there any other challenges that immediately comes to mind? I suppose the c-api would have to be extended as well.

dbaston commented 1 year ago

Not to my mind. I looked into it a while ago, and to my untrained eye this looked like a reasonable way to do the implementation: https://github.com/aws/aws-sdk-cpp/tree/bb1fdce01cc7e8ae2fe7162f24c8836e9d3ab0a2/aws-cpp-sdk-core/include/aws/core/utils/memory