stac-utils / titiler-pgstac

TiTiler + PgSTAC
https://stac-utils.github.io/titiler-pgstac/
MIT License
78 stars 27 forks source link

Performance issues with server-side tile composition #183

Open drnextgis opened 3 weeks ago

drnextgis commented 3 weeks ago

We’re trying to replace our current tile generation method (client-side composition using a /cog endpoint) with a server-side approach using the /searches endpoint. However, this new method is up to five times slower in some tests. Is this because the same level of performance can’t be achieved with titiler-pgstac for server-side tile composition on the same resources?

Here’s a snippet demonstrating the current approach. The issue is that it results in a high number of requests to the web server. We considered switching to /searches as a potential improvement, but so far, we haven’t achieved comparable performance.

import requests

from io import BytesIO
from PIL import Image
from concurrent.futures import ThreadPoolExecutor

def stack_images(images):
    width, height = images[0].size
    new_image = Image.new("RGBA", (width, height))
    for img in images:
        new_image.paste(img, (0, 0), img)
    return new_image

def get_urls(search_id, tile, aname="analytic"):
    z, x, y = tile
    assets_url = f"https://{titiler_url}/searches/{search_id}/tiles/WebMercatorQuad/{z}/{x}/{y}/assets"

    urls = []
    response = requests.get(assets_url)
    for item in response.json():
        urls.append(item["assets"][aname]["href"])

    print(f"Number of assets: {len(urls)}")

    return urls

def get_tile(url, tile):
    z, x, y = tile
    tile_url = f"https://{titiler_url}/cog/tiles/{z}/{x}/{y}?bidx=1&bidx=2&bidx=3&format=png&scale=2&tileMatrixSetId=WebMercatorQuad&url={url}"

    response = requests.get(tile_url)
    image = Image.open(BytesIO(response.content))

    return image

if __name__ == "__main__":
    tile = (10, 173, 407)
    search_id = "b9440824baca3a312082e3814a0f5c1b"
    urls = get_urls(search_id, tile)
    with ThreadPoolExecutor() as executor:
        images = list(executor.map(get_tile, urls, [tile] * len(urls)))
        img = stack_images(images)
        img.save("tile01.png")
$ time python local.py
Number of assets: 75
python local.py  5,18s user 0,12s system 77% cpu 6,815 total
drnextgis commented 2 weeks ago

Here is the configuration we are using:

# GDAL Config
CPL_TMPDIR=/tmp
GDAL_CACHEMAX=75%
GDAL_INGESTED_BYTES_AT_OPEN=32768
GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIR
GDAL_HTTP_MERGE_CONSECUTIVE_RANGES=YES
GDAL_HTTP_MULTIPLEX=YES
GDAL_HTTP_VERSION=2
VSI_CACHE=TRUE
VSI_CACHE_SIZE=536870912
MOSAIC_CONCURRENCY=1
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_SESSION_TOKEN=

Based on my tests, it’s clear that MOSAIC_CONCURRENCY and GDAL_CACHEMAX have the most significant impact:

| GDAL_CACHEMAX               | MOSAIC_CONCURRENCY             | Server Side Mosaic | Client Side Mosaic |
|-----------------------------|--------------------------------|--------------------|--------------------|
| 75% (total mem: 16 GB)      | 1                              | 65s                | 6,989s             |
| 75% (total mem: 16 GB)      | 8 (8 CPUs)                     | 11s                | 6,692s             |
| 200                         | 8 (8 CPUs)                     | 40s                | 32s                |

However, even with the same resources, I was unable to achieve comparable performance for server-side rendering as I did for client-side rendering. Probably I need to try with more resources?

All tests were conducted on the same single tile.

When I set MOSAIC_CONCURRENCY=20, the server process gets killed.

drnextgis commented 2 weeks ago

From what I understand, titiler-pgstac uses mosaic_reader, so I attempted to rewrite the code from my initial message using it (local-rio-tiler.py):

from rio_tiler.io import  Reader
from rio_tiler.mosaic import mosaic_reader

def reader(asset: str, *args, **kwargs):
    with Reader(asset) as src:
        return src.tile(*args, **kwargs)

img, assets = mosaic_reader(urls, reader, x, y, z, indexes=[1, 2, 3], tilesize=512, threads=8)

~However, it works much more slowly (30 s vs 7 s).~ Using the same environment variables as for titiler-pgstac, it shows similar performance as when using the /searches endpoint (as expected), but it's still twice as slow compared to the approach mentioned in the initial message of the thread.

$ time python local.py
python local.py  10,11s user 0,10s system 196% cpu 5,201 total

$ time python local-rio-tiler.py 
python local-rio-tiler.py  35,21s user 2,09s system 310% cpu 11,996 total
drnextgis commented 2 weeks ago

I have a hypothesis that might explain the observed behavior: In the first case, we use ThreadPoolExecutor solely for I/O-bound tasks (retrieving PNG tiles), whereas, in contrast, mosaic_reader internally uses ThreadPoolExecutor not just for data download but also for reprojecting the data to the tile's CRS and to mosaic assets, which is a CPU-bound task. @vincentsarago what do you think?

drnextgis commented 2 weeks ago

If there's anything I can do to help move this issue forward, please let me know. However, at this point, I'm leaning towards believing it's a design problem, and without refactoring of titiler-pgstac/rio-tiler, there may not be much we can do.

vincentsarago commented 2 weeks ago

I think most of the issue is that you're dealing with a large number of assets (75).

As you mentioned, the way MosaicBackend/rio-tiler is designed is by using Threads to distribute the asset reading. As mentioned in https://cogeotiff.github.io/rio-tiler/mosaic/#smart-multi-threading we're trying to have a smart approach but sadly sometime we can't outsmart the task!

if you're tile need to be composed of more than a couple assets, there is no magic!

That's said I'm always interested to see if we can make rio-tiler/titiler better