turbot / steampipe

Zero-ETL, infinite possibilities. Live query APIs, code & more with SQL. No DB required.
https://steampipe.io
GNU Affero General Public License v3.0
6.67k stars 262 forks source link

steampipe postgres size management #4260

Open deivK opened 2 months ago

deivK commented 2 months ago

Hi, we are using the aws plugin to collect some data from AWS. Every day we run some batch queries for each account and store data in another postgres db we use for analysis. The issue is that the steampipe postgres, the one under .steampipe/db/14.2.0, is increasing out of control: the /.steampipe/db/14.2.0/data folder has reached 94 GB.

How can we reduce the size of this db and how can we keep it under control?

Many thanks Davide

pskrbasu commented 2 months ago

Hi @deivK this seems like something abnormal, 94GB is huge 🤯

A few questions:

  1. What steampipe version are you on?
  2. You mentioned having AWS accounts setup, how many accounts do you have/are you querying?
  3. Are you using any other steampipe plugins?
  4. What are the sizes of data and postgres directories inside .steampipe/db/14.2.0?
  5. "and store data in another postgres db we use for analysis" - I want to understand how you do this. Do you use a script?
deivK commented 2 months ago

Hi, thank you for your reply.

  1. We are using Stemapipe v0.21.6
  2. We are querying 35 accounts
  3. we just use the aws plugin
  4. data is 94G, postgres is 127M
  5. we have a bunch of python scripts that, for each account, run batch queries and store the results in another database. Most of those queries are plain select from an aws table (like aws_account, aws_ec2_ami, aws_ec2_instance, ...). This happens every day, once a day.

Thanks again

e-gineer commented 2 months ago

Have you tried analyzing the steampipe postgres database directory to see the main driver of the space usage?

Perhaps follow some steps in this StackOverflow article?

https://dba.stackexchange.com/questions/314079/postgres-how-to-find-where-database-size-growth-is-coming-from

Will be interesting to learn what you see?

deivK commented 2 months ago

Hi all, I started steampipe as a service so that I could connect to the postgres database. I manually ran some VACUUM and the database size dropped to 700M. Maybe, because the service was not running, the autovacuum were not executed? I'm leaving the service running, even if we are running queries as batch queries, and monitoring the db size,

e-gineer commented 2 months ago

Interesting that autovacuum fixed it. How were you running Steampipe that it grew to that size? Keen to learn from your setup / experience...

github-actions[bot] commented 1 day ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.