Closed cldellow closed 8 months ago
That's superb!
With North America I get an even bigger memory saving - down from 20.1GB to 13.2GB (using --store
and with shapefiles). I think about 4GB of that is ClipCache so there may be some opportunity to optimise further, e.g. by using an LRU cache as you mention in the comments.
On my usual machine, 40.2GB (with --store
) for the full planet including the poles, 5hr12m22. Really happy with that.
It was sitting on one thread for about 10/15 minutes at the end, which I'm guessing was crunching through a particularly complex set of geometries (very possibly at the poles).
I think about 4GB of that is ClipCache
Oh, wow. I hadn't thought too critically about this, but I didn't expect it to use quite that much memory.
The cache size does increase with more cores, so perhaps this is expected - for 24 cores, 4 GB is "only" 160MB/core.
I don't often test with the shapefiles, I should probably start doing that, especially as I think the runtime overhead is much smaller now (and especially after #614 lands).
so there may be some opportunity to optimise further, e.g. by using an LRU cache as you mention in the comments.
I imagined the LRU cache would help with runtime, but not with memory usage. The benefit I was hoping for is that we'd more often have relevant things in the cache, but I wasn't planning on shrinking the cache.
The cache might still be oversized, though. I didn't really experiment at all to find where the point of diminishing returns started, I just picked a number, observed an improvement and went with it.
It was sitting on one thread for about 10/15 minutes at the end
I also noticed this, and I noticed that GB had a similar straggler on the Hetzner box (Ubuntu 22.04).
Interestingly, the GB straggler doesn't appear when I run locally (Ubuntu 20.04) or when I ran on the Hetzner box prior to the #611 change.
So I think I concluded that something in #611 causes the slowdown, but only on more modern Boost versions than the one in Ubuntu 20.04.
According to https://packages.ubuntu.com/search?keywords=libboost-, 20.04 has 1.71 and 22.04 has 1.74.
Nothing in Boost's changelog jumps out at me, though.
The cache might still be oversized, though. I didn't really experiment at all to find where the point of diminishing returns started, I just picked a number, observed an improvement and went with it.
I did have a very brief attempt at downsizing the cache size and being a bit pickier about what we put in it (e.g. no simple polygons with <20 vertices) but it removed the performance gains. But there may still be possibilities along those lines.
So I think I concluded that something in https://github.com/systemed/tilemaker/pull/611 causes the slowdown, but only on more modern Boost versions than the one in Ubuntu 20.04.
In order of likelihood:
geom::correct
before simplify_combine
(geom.cpp) will have some overhead. It's possible that this is pathological in the case of particularly large polygons and we should call our own make_valid
(aka geometry::correct
) in these cases instead, but we'd have to be careful not to reintroduce the invalid geometries this was intended to fix.BOOST_GEOMETRY_NO_ROBUSTNESS
behaviourRemovePartsBelowSize
is taking a while (this should be easy to test - just remove filter_below
and filter_area
from the JSON config)This looks good to merge to me - ok with that?
Yup!
Excellent - thank you again!
Quick issue you might be able to help me with...
I've been playing around with a branch that can load GeoJSON files in the same way that we use shapefiles. (GeoJSON is easy to generate, and I quite often find myself writing little Ruby scripts that turn some data or other into GeoJSON.) At the same time, I've moved the shapefile code into its own class. It's at https://github.com/systemed/tilemaker/compare/master...geojson
On shutdown, after the mbtiles/pmtiles file has been written, it terminates (on Mac) with:
Filled the tileset with good things at /Users/richard/Maps/planet/oxfordshire.pmtiles
libc++abi.dylib: terminating with uncaught exception of type std::__1::system_error: mutex lock failed: Invalid argument
I've run valgrind over it and that shows:
==57979== Warning: set address range perms: large range [0x11fc54028, 0x12fc54058) (noaccess)
==57979== Invalid read of size 8
==57979== at 0x100F41137: pthread_mutex_lock (in /usr/lib/system/libsystem_pthread.dylib)
==57979== by 0x100237AAF: LeasedStore<std::__1::vector<boost::geometry::model::multi_linestring<boost::geometry::model::linestring<boost::geometry::model::d2::point_xy<double, boost::geometry::cs::cartesian>, std::__1::vector, mmap_allocator>, std::__1::vector, mmap_allocator>, std::__1::allocator<boost::geometry::model::multi_linestring<boost::geometry::model::linestring<boost::geometry::model::d2::point_xy<double, boost::geometry::cs::cartesian>, std::__1::vector, mmap_allocator>, std::__1::vector, mmap_allocator> > > >::~LeasedStore() (in /usr/local/bin/tilemaker)
==57979== by 0x100C580D1: tlv_finalize (in /usr/lib/system/libdyld.dylib)
==57979== by 0x100D3B6AB: exit (in /usr/lib/system/libsystem_c.dylib)
==57979== by 0x100C583DB: start (in /usr/lib/system/libdyld.dylib)
==57979== by 0x2: ???
==57979== by 0x104F018FE: ???
==57979== by 0x104F01908: ???
==57979== by 0x104F0193E: ???
==57979== Address 0x104ef0e18 is on thread 1's stack
==57979== 67808 bytes below stack pointer
[same error two more times]
libc++abi.dylib: terminating with uncaught exception of type std::__1::system_error: mutex lock failed: Invalid argument
==57979==
==57979== Process terminating with default action of signal 6 (SIGABRT)
==57979== at 0x100EF0D46: __pthread_sigmask (in /usr/lib/system/libsystem_kernel.dylib)
[etc.]
So I think it's something to do with what you've described here:
* remove locks from geometry stores
Rather than locking on each store call, threads lease a range of the
ID space for points/lines/multilines/polygons. When the thread ends,
it return the lease.
This has some implications:
- the IDs are no longer guaranteed to be contiguous
- shapefiles are a bit weird, as they're loaded on the main
thread -- so their lease won't be returned until program
end. This is fine, just pointing it out.
...and that I've missed something obvious. I can't see what that is (but then it is 1.30am). Any thoughts?
I gave that a quick whirl and didn't repro a crash (granted, on Linux). If it's easy to replicate the command you're running, I can try that.
Speculation: the LeasedStore destructor is running after the ShpMemTiles destructor.
In the OSM case, the LeasedStore destructor runs when the threads for the various ReadPhases are terminated.
In this case, if the reading of GeoJSON and SHP is happening on the main tilemaker thread, it'll run when the thread ends -- which might be after ShpMemTiles has been destroyed.
The overall design is a bit fragile/footgun-y at the moment.
If that's the case, possible fixes could include:
Ah, I repro a different error if I try that branch with some random geojson file:
Reading GeoJSON geojson
Reading shapefile urban_areas
terminate called after throwing an instance of 'std::runtime_error'
what(): fatal: no available stores to lease
If I then wrap https://github.com/systemed/tilemaker/compare/systemed:ae1981b...systemed:a1976d2# in the world's most pointless thread pool:
void GeoJSONProcessor::read(class LayerDef &layer, uint layerNum) {
boost::asio::thread_pool pool(1);
boost::asio::post(pool, [&]() {
// Parse the JSON file into a RapidJSON document
...
});
pool.join();
}
it runs as expected.
I think the precise behaviour you'd get would depend on the order your shapefiles or geojson files were processed (which I guess depends on your layer orders).
So I think that confirms that the leases either need to be manually relinquished (if running on the same thread), or the reading needs to happen in a separate thread to ensure the leases get returned.
Thanks! That fixes it - I've used the same thread pool approach as with shapefiles and it's fine now. Will do a bit more testing.
This PR has a few things:
The memory reduction comes from leaving most ways in the WayStore and generating the linestring or polygon on-demand at writing time. There are a few tricks to mitigate the cost of rebuilding the geometry -- we only generate on-demand those ways that didn't need to be corrected, and each thread keeps a thread-local cache of linestrings that they recently constructed.
I explored extending this to relations as well, but it seemed like there was much less opportunity with relations: each individual relation is bigger, so reconstructing it is more costly; they're more often in need of correction; and in absolute terms, they're not as big of a memory hog as all the ways. My bag of tricks wasn't able to mitigate the runtime impact. There might be something there, but I'm going to admit defeat for now.
GB, master:
GB, this PR:
GB, this PR with
--materialize-geometries
:planet-latest, this PR on a 48-core Hetzner CCX63:
tilemaker used 42GB of RAM and 249GB of virtual memory to generate the planet -- so still a bit out of reach for people on smaller machines.