ioos / catalog

IOOS Catalog general repo for documentation and issues
https://ioos.github.io/catalog/
MIT License
2 stars 6 forks source link

CKAN search performance/PG database bloat #90

Open mwengren opened 2 months ago

mwengren commented 2 months ago

Catalog has been performing slowly in search results, page load times, etc.

@benjwadams suspects this is due to database bloat in the PostgreSQL database CKAN uses. CKAN is configured to store old/expired datasets and other entities in _history tables that periodically need to be pruned.

Recommendation is to do a vacuum analyze process on the database, which requires putting it in a locked state, or using pg_repack extension to automate a cleanup process.

@benjwadams to investigate options for maintaining better routine performance. If I got any of the above wrong, Ben, please post on the issue thread the approach you take to address it.

mwengren commented 1 month ago

@benjwadams We've been getting additional user feedback that the Catalog is performing slowly or giving 404 errors.

When you're able, could you please look into options for improving performance, including the approach above or other mitigations possible to improve DB performance? Would increasing the instance size of the DB help, for example?

mwengren commented 1 week ago

During today's meeting, Catalog was still having performance issues. Potentially due to high request volume/traffic incoming from bots or web-scraping jobs. @benjwadams look into any changes necessary to our request filtering rules to improve performance and reduce overly aggressive request traffic.