t-rex-tileserver / t-rex

t-rex is a vector tile server specialized on publishing MVT tiles from your own data
https://t-rex.tileserver.ch/
MIT License
556 stars 69 forks source link

gpkg / gdal parallel performance less than sequential #286

Open roelarents opened 2 years ago

roelarents commented 2 years ago

When serving tiles from a geopackage (~85 GiB, ~20 layers) I see that the response times are drastically lower when requests are made in parallel. E.g. when I create test requests for 6 tiles in parallel the response time is ~8000ms versus ~300ms when I create them one after another.

I've tried looking at the source code (I'm a Rust rookie) but I couldn't find a hint yet. I tried. I've tried supplying the NOLOCK open option. Thinking that it might be the overhead of a locked GPKG that is trying to be opened by multiple clients at the same time. Even though it should by possible to open an sqlite db concurrently. That helped a little to get it to ~7000 ms.

Does anyone have an idea where the delay in response times might originate?

Edit: connection is already opened read-only by default. Opening shared would be unsafe. Edit2: Made an image with gdal 3.4.2 and t-rex to try out NOLOCK. Edit3: The same effect happens with a FlatGeoBuf datasource.

roelarents commented 2 years ago

After some profiling I conclude that opening the GDAL Dataset (multiple times, especially concurrently) is the bottleneck. I think that the dataset should be opened once (in the connected or new method) and then shared using the Send concept. However:

pka commented 2 years ago

Thanks for the thorough investigation! I guess that my GDAL driver implementation came earlier than this change in the Rust wrapper. Would be great to have a little proof of concept as standalone GDAL example.

roelarents commented 2 years ago

Would be great to have a little proof of concept as standalone GDAL example.

(Sorry to bump this. I should have responded earlier.) I agree that would be useful. But I don't have the skills to do that (or most things) in rust. Or do you mean just a setup with some gpkg and a trex config file?

roelarents commented 1 year ago

Our team has been working on this recently because we want to start serving tiles "on-the-fly" with t-rex instead of pre-tiling as we do now. We haven't found the cause in t-rex (rust is too hard :wink: ), but we did make a setup for a multi process t-rex, with an http server in front. This circumvents the bottleneck. But of course it's not ideal, multithreading (as already supported in t-rex) should be more efficient than spinning up multiple processes.

Perhaps you want to take a look at the difference. We made a simple test setup here. On my machine there's a factor ~5 difference in the advantage of multi-processing (same underlying resources).

pka commented 1 year ago

Thanks for the test setup! Is this all.gpkg free to use in a different repo to implement the same test?

roelarents commented 1 year ago

Sure. It's public topography data (from here).