stac-utils / stac-fastapi-pgstac

PostgreSQL backend for stac-fastapi using pgstac (https://github.com/stac-utils/pgstac)
MIT License
45 stars 20 forks source link

"use_api_hydrate" in app.settings v.s. pgstac's "nohydrate" conf #133

Open louisstuart96 opened 1 month ago

louisstuart96 commented 1 month ago

Our team is testing eoapi (on top of stac-fastapi-pgstac) against STAC items with lots of asset links. We faced performance problems in search request with 'query' or 'filter' extensions. Our assumption is that application's hydration setting causes this problem.

https://github.com/stac-utils/stac-fastapi-pgstac/blob/a81e4d76abd2e460882a55b78ac4b2c7e34ff510/stac_fastapi/pgstac/core.py#L164

Here, the app's default setting is use_api_hydrate = False, which in turn becomes nohydrate=false in PgSTAC query. However, the correct setting should be nohydrate=true:

https://stac-utils.github.io/pgstac/pgstac/#runtime-configurations

mmcfarland commented 1 month ago

The double negative nohydrate=false is a bit cumbersome to reason through here, but I think the default is logical. When "use api" is turned off then nohydrate==false, which means the DB will perform the hydration. If "use api" is turned on, nohydrate==true, and the database will skip the hydration step and it must be performed on the API side.

use_api_hydrate nohydrate Hydration performed in
false false database
true true API

Whether or not the default is right for your setup is a bit subjective, though. In our experience, the option should target where you have the most spare compute. In the Planetary Computer, which has a single, large pgstac database server instance, using DB Hydration resulted in high CPU usage there, slowing down queries across the board. We also had a fairly large API cluster though, so we were able to spread out that CPU load across the various nodes and the DB could remain responsive. If you have more compute in your DB server than in your API instances, it may be better to keep it the DB.