Open mgibbs189 opened 5 months ago
Option 1: gdal requires ludicrous amount of free space (7+ TB)
Does this also happen if you create a .vrt
using gdalbuildvrt
instead of using gdal_merge
?
Then gdal_translate
to pmtiles if you're on 3.8+: https://gdal.org/drivers/vector/pmtiles.html
@bdon here's the best approach I've found thus far:
gdalbuildvrt -o worldcover_v1.vrt Extracted/*.tif
gdal_translate -of MBTiles -co RESAMPLING=NEAREST worldcover_v1.vrt worldcover.mbtiles
pmtiles convert worldcover.mbtiles worldcover.pmtiles --tmpdir=./tmp/
re: gdal's pmtiles
driver (vectors only?):
This driver supports reading and writing PMTiles datasets containing vector tiles, encoded in the MapVector Tiles (MVT) format.
The issue is that gdal_translate is single-threaded. The above process takes 4 days for a 120GB dataset.
If I could process each .tif separately (.tif -> .mbtiles -> .pmtiles), then merge them all together, that could potentially save a TON of time. Your pmtiles
CLI commands seem to be way more optimized than gdal.
The issue is that gdal_translate is single-threaded. The above process takes 4 days for a 120GB dataset.
Maybe https://github.com/mapbox/rio-mbtiles is a multithreaded option? I've been exploring a derivative of that that can generate PMTiles directly.
The rio-mbtiles command is suited for small to medium (~1 GB) raster sources.
It seems like every solution thus far has caveats.
rio-mbtiles
doesn't help much here, because I could just run gdal_translate -of MBTiles
individually on each tif file, then run pmtiles convert
individually on each mbtiles
file to get a bunch ofpmtiles
files.
The problem is still the same... squashing all the pmtiles into one.
Let's assume the simplest use-case (merging pmtiles with no overlaps). What does that process look like to build out pmtiles merge
? Is it technically complex? Could I maybe sponsor it?
Let's assume the simplest use-case (merging pmtiles with no overlaps)
In most cases "no overlaps" is not actually feasible. If you have 2600 GeoTIFFs, and each one is being tiled to zoom 0, then all the resulting .pmtiles
will have data at z=0, meaning pmtiles merge
is not enough, we will also need to do image operations to mosaic tiles. z=0 is an extreme example but it's pretty likely that there is some spatial overlap between lower zooms, unless your use case is specific enough that there's really no overlap.
^ My solution to the above is that any correct solution needs to start with a .vrt
and GDAL handles the raster mosaicing prior to tiling. I suggested rio-mbtiles
because that has parallelism, and if we identify the limits on raster sources we ought to be able to work around them. One solution may be to use gdaladdo
on the VRT prior to tiling.
I'll give rio-mbtiles
another try and will report back.
Thanks for the clarification re: pmtiles merge
.
@bdon For the 120GB dataset, rio mbtiles
is showing 900 hours to finish... on a monster machine with 30 cores processing it.
Again, I love PMTiles but this hasn't been a great experience 🙃
The fastest approach I've found thus far is to create the millions of tiles via gdal2tiles.py
(bleh), then mb-util
to clump them together, then finally pmtiles convert
.
Really, really hoping that you could give us raster users some better tooling, whether it involves pmtiles merge
, some custom rio
extension, etc.
Again, I love PMTiles but this hasn't been a great experience
The problems you are encountering seem to be related to your specific challenging dataset and tiling it, regardless of tile archive format. Could you make your dataset and current tooling available so other can pitch in to help?
It's the ESA WorldCover dataset: https://worldcover2021.esa.int/
The best approach thus far:
gdalbuildvrt
to generate a .vrt from the 2600+ extracted .tifsgdal2tiles.py
to create Tiles directory from the .vrt (millions of png tiles)mb-util
to build an .mbtiles file from the Tiles directorypmtiles convert
to input the .mbtiles file and output a .pmtiles fileEvery other solution I've tried has either failed (see the first comment in this thread), required absurd resources, or would've taken many days (weeks) to complete.
@bdon maybe you could give it a try as a proof-of-concept 😇
Have you looked into using libvips
to join the TIFFs and then generate a tileset? Something like this:
libvips
with arrayjoin
to merge the TIFFs, then with dzsave format=google
to create a zyx
tileset directorypmtiles
with disk_to_pmtiles
to convert that zyx
directory to PMTileslibvips
works as a pipeline. It's very quick and does not require keeping images in memory when manipulating them. There's also a pyvips
binding. I use that.
@larsmaxfield thanks for the suggestion. That's very similar to Option 3
above (I'm hoping to avoid the tileset generation step). There's got to be a better way around it, but the likelihood of a pmtiles merge
command isn't looking promising 😬
If you want the ESA WorldCover as tiles you can modify this script https://github.com/OvertureMaps/overture-tiles/blob/main/profiles/Base.java to convert Overture GeoParquet's ESA Worldcover dataset into tiles.
I have a 120GB raster dataset, split into 2600 geotiffs. Extracting a
.pmtiles
from it hasn't been easy 🙃Would you consider adding a
pmtiles merge
CLI command?See below for the different approaches I've tried, and how a
pmtiles merge
command could help.Option 1: gdal_merge of the geotiffs, then gdal_translate to .mbtiles
[-] Single-threaded [-] painfully slow [-] gdal requires ludicrous amount of free space (7+ TB)
Option 2: Convert each geotiff to its own .mbtiles file, then merge the .mbtiles files together
[+] multi-threaded, minus the merging [-] tippecanoe's
tile-join
doesn't support rastersOption 3: gdal2tiles -> directory of .pngs, then
mb-util
to generate merged .mbtiles, then .pmtiles[+] png creation multi-threaded [-] creating millions of pngs is torture on the filesystem [-] slow
Option 4 (proposed): Convert each geotiff into its own .mbtiles file, then
pmtiles convert
for each, thenpmtiles merge
to stitch all the pmtiles together[+] multi-threaded [+] fast(?)