opendatacube / datacube-explorer

Web-based exploration of Open Data Cube collections
Apache License 2.0
54 stars 31 forks source link

Unexpected Database Load running cubedash-gen #582

Open omad opened 3 months ago

omad commented 3 months ago

Follow on question from #581 .

We recently noticed a significant spike in Database IO charges against the development DEA Database, and suspect that it's due to an oversight leading to cubedash-gen being run hourly instead of 6 hourly, and also not coping correctly with the agdc schema tables being manually modified as a part of new product development. Deleting and re-indexing of some ODC Products.

The command executed is cubedash-gen --verbose --no-init-database --refresh-stats --all, which I expected would have been able to run very efficiently if there hadn't been new Datasets added. However, this command was taking between 10-20 minutes to run, and causing significant load on the database.

image

Example Airflow log of a failing cubedash-gen run

jeremyh commented 2 months ago

There's a good chance that the == None/is None changes found in #581 caused these performance issues, as the queries used by Explorer would no longer match the underlying indexes.