sipcapture / homer7-docker

HOMER 7 Docker Images
92 stars 86 forks source link

Postgresql database size #89

Open solarmon opened 4 years ago

solarmon commented 4 years ago

Hi,

I'm having problems my dockered Homer 7 setup. It was working and now I am not able to login. Initially login was saying the password is wrong, now I'm getting a '502 Bad Gateway'.

The docker 'sipcapture/webapp' docker imaga seems to be in a 'Restarting' state after trying to restart it, or reboot the VM it is on.

I then noticed that there was no space left on root and the following directory was taking up most of the storage space:

/usr/local/src/homer7-docker/heplify-server/hom7-prom-all/postgres-data/base/24576

I assume this where the SIP captures are actually stored omn the postgresql database.

How do I clean this up to make more space, without being able to login to the Homer GUI? Is there a way to truncate or dropthe SIP captures after a certain date?

Thank you.

solarmon commented 4 years ago

I checked the helplify-server environment variables and rotation should be enable with the HEPLIFYSERVER_DBROTATE and HEPLIFYSERVER_DBDROPDAYS variables:

HEPLIFYSERVER_DBROTATE=true
HEPLIFYSERVER_DBDROPDAYS=5
HEPLIFYSERVER_LOGLVL=info
HEPLIFYSERVER_LOGSTD=true

So I'm not sure why/how the postgresql database has got so large.

solarmon commented 4 years ago

I managed to clear enough storage space to be able to restart the docker services, get it running, and now able to login.

However, how do I now manually delete the old entries?!

lmangani commented 4 years ago

We have reports of the ENV variables not working indeed. The immediate solution is map your own heplify configuration file to the container and specify the values there while we investigate this.

solarmon commented 4 years ago

We have reports of the ENV variables not working indeed. The immediate solution is map your own heplify configuration file to the container and specify the values there while we investigate this.

Hi. Thanks for your response!

I'm new to docker so please could you provide further details on how exactly to do that.

Also, I'm able to access the 'db' docker image now and can use psql to manage the postgres database. I see that the 'homer_data' database has many tables in it - I assume this is whether the SIP captures are kept. Is it just a case of dropping these tables to delete the data, or would that break anything that is referencing them?

Thank you!

lmangani commented 4 years ago

Could you provide the output of docker logs heplify-server

solarmon commented 4 years ago

Hi.

Not sure whether you need all of it, but this is the end of the logs. There are lots of ERR entries for rotator.go.

2020/09/29 08:58:57.367745 rotator.go:94: ERR dial tcp: lookup db on 127.0.0.11:53: server misbehaving 2020/09/29 08:58:59.365966 rotator.go:94: ERR dial tcp: lookup db on 127.0.0.11:53: server misbehaving 2020/09/29 08:59:01.366679 rotator.go:94: ERR dial tcp: lookup db on 127.0.0.11:53: server misbehaving 2020/09/29 08:59:03.366140 rotator.go:94: ERR dial tcp: lookup db on 127.0.0.11:53: server misbehaving 2020/09/29 08:59:05.366100 rotator.go:94: ERR dial tcp: lookup db on 127.0.0.11:53: server misbehaving 2020/09/29 08:59:07.367422 rotator.go:94: ERR dial tcp: lookup db on 127.0.0.11:53: server misbehaving 2020/09/29 08:59:09.366574 rotator.go:94: ERR dial tcp: lookup db on 127.0.0.11:53: server misbehaving 2020/09/29 08:59:11.367762 rotator.go:94: ERR dial tcp: lookup db on 127.0.0.11:53: server misbehaving 2020/09/29 08:59:13.366531 rotator.go:94: ERR dial tcp: lookup db on 127.0.0.11:53: server misbehaving 2020/09/29 08:59:15.366537 rotator.go:94: ERR dial tcp: lookup db on 127.0.0.11:53: server misbehaving 2020/09/29 08:59:17.366794 rotator.go:94: ERR dial tcp: lookup db on 127.0.0.11:53: server misbehaving 2020/09/29 08:59:19.364418 rotator.go:94: ERR dial tcp: lookup db on 127.0.0.11:53: server misbehaving 2020/09/29 08:59:21.363446 rotator.go:94: ERR dial tcp: lookup db on 127.0.0.11:53: server misbehaving 2020/09/29 08:59:23.364576 rotator.go:94: ERR dial tcp 172.18.0.3:5432: connect: connection refused 2020/09/29 08:59:25.368256 rotator.go:406: WARN pq: database "homer_data" already exists 2020/09/29 08:59:25.370230 rotator.go:406: WARN pq: role "homer_user" already exists 2020/09/29 08:59:25.371741 rotator.go:406: WARN pq: role "postgres" does not exist 2020/09/29 08:59:25.375747 rotator.go:344: INFO start creating tables (2020-09-29 08:59:25.375725691 +0000 UTC m=+778.360364288) 2020/09/29 08:59:29.303691 heplify-server.go:102: INFO stopping heplify-server... 2020/09/29 08:59:29.338305 udp.go:27: INFO stopping UDP listener on [::]:9060 2020/09/29 08:59:30.303800 tls.go:44: INFO stopping TLS listener on [::]:9060 2020/09/29 08:59:38.572172 rotator.go:357: INFO end creating tables (2020-09-29 08:59:38.572152754 +0000 UTC m=+791.556791401) 2020/09/29 08:59:38.579472 postgres.go:61: INFO postgres connection established 2020/09/29 08:59:38.579495 database.go:75: INFO close postgres channel 2020/09/29 08:59:38.579510 remotelog.go:44: INFO close remotelog channel 2020/09/29 08:59:38.579528 metric.go:69: INFO close metric channel 2020/09/29 08:59:38.579538 heplify-server.go:111: INFO heplify-server has been stopped 2020/09/29 08:59:38.913283 server.go:91: INFO start heplify-server 1.53 with config.HeplifyServer{HEPAddr:"0.0.0.0:9060", HEPTCPAddr:"", HEPTLSAddr:"0.0.0.0:9060", HEPWSAddr:"", ESAddr:"", ESDiscovery:true, ESUser:"", ESPass:"", LokiURL:"http://loki:3100/api/prom/push", LokiBulk:400, LokiTime r:2, LokiBuffer:100000, LokiHEPFilter:[]int{1, 5, 100}, ForceHEPPayload:[]int(nil), PromAddr:"0.0.0.0:9096", PromTargetIP:"", PromTargetName:"", DBShema:"homer7", DBDriver:"postgres", DBAddr:"db:5432", DBUser:"root", DBPass:"homerSeven", DBDataTable:"homer_data", DBConfTable:"homer_config", D BBulk:400, DBTimer:4, DBBuffer:400000, DBWorker:8, DBRotate:true, DBPartLog:"2h", DBPartIsup:"6h", DBPartSip:"2h", DBPartQos:"6h", DBDropDays:5, DBDropDaysCall:0, DBDropDaysRegister:0, DBDropDaysDefault:0, DBDropOnStart:false, Dedup:false, DiscardMethod:[]string(nil), AlegIDs:[]string(nil), C ustomHeader:[]string(nil), SIPHeader:[]string{"ruri_user", "ruri_domain", "from_user", "from_tag", "to_user", "callid", "cseq", "method", "user_agent"}, LogDbg:"", LogLvl:"info", LogStd:true, LogSys:false, Config:"./heplify-server.toml", ConfigHTTPAddr:"", ConfigHTTPPW:"", Version:false, Scri ptEnable:false, ScriptEngine:"lua", ScriptFolder:"", ScriptHEPFilter:[]int{1, 5, 100}, TLSCertFolder:"."} 2020/09/29 08:59:38.913490 prometheus.go:38: INFO expose metrics without or unbalanced targets Could not find toml config file, use flag defaults. 2020/09/29 08:59:40.943668 rotator.go:406: WARN pq: database "homer_data" already exists 2020/09/29 08:59:40.946180 rotator.go:406: WARN pq: role "homer_user" already exists 2020/09/29 08:59:40.947029 rotator.go:406: WARN pq: role "postgres" does not exist 2020/09/29 08:59:40.949883 rotator.go:344: INFO start creating tables (2020-09-29 08:59:40.949871595 +0000 UTC m=+2.045789502) 2020/09/29 08:59:41.267689 rotator.go:357: INFO end creating tables (2020-09-29 08:59:41.267679821 +0000 UTC m=+2.363597725) 2020/09/29 08:59:41.272125 postgres.go:61: INFO postgres connection established 2020/09/29 09:04:38.917968 server.go:295: INFO stats since last 5 minutes. PPS: 29, HEP: 8937, Filtered: 0, Error: 0 2020/09/29 09:09:38.917958 server.go:295: INFO stats since last 5 minutes. PPS: 29, HEP: 8906, Filtered: 0, Error: 0 2020/09/29 09:14:38.917962 server.go:295: INFO stats since last 5 minutes. PPS: 29, HEP: 8914, Filtered: 0, Error: 0 2020/09/29 09:19:38.917886 server.go:295: INFO stats since last 5 minutes. PPS: 29, HEP: 8912, Filtered: 0, Error: 0 2020/09/29 09:24:38.917910 server.go:295: INFO stats since last 5 minutes. PPS: 29, HEP: 8892, Filtered: 0, Error: 0 2020/09/29 09:29:38.917904 server.go:295: INFO stats since last 5 minutes. PPS: 29, HEP: 8916, Filtered: 0, Error: 0 2020/09/29 09:34:38.917864 server.go:295: INFO stats since last 5 minutes. PPS: 29, HEP: 8904, Filtered: 0, Error: 0 2020/09/29 09:39:38.917947 server.go:295: INFO stats since last 5 minutes. PPS: 29, HEP: 8910, Filtered: 0, Error: 0 2020/09/29 09:44:38.917905 server.go:295: INFO stats since last 5 minutes. PPS: 29, HEP: 8914, Filtered: 0, Error: 0 2020/09/29 09:49:38.917816 server.go:295: INFO stats since last 5 minutes. PPS: 29, HEP: 8906, Filtered: 0, Error: 0 2020/09/29 09:54:38.917949 server.go:295: INFO stats since last 5 minutes. PPS: 29, HEP: 8892, Filtered: 0, Error: 0

Please let me know if there is anything else that is needed.

Thank you.

lmangani commented 4 years ago

The settings seems to propagate correctly. Could you confirm you have data older than 5 days in the DB?

solarmon commented 4 years ago

To be honest, I'm not sure how check for the data, since this particular system does not get any/much SIP traffic - this why it is worrying that storage space has been heavily used by the postgresql database.

I can see that homer_data database contains lost of tables, with the date stamps in their names that are older than 5 days - i.e. from 17th September. I haven't tried to check the size and contents of the tables yet. There are 607 rows.

For example:

List of relations Schema Name Type Owner --------+----------------------------------------+-------+------- public hep_proto_100_default table root public hep_proto_100_default_20200917_0000 table root public hep_proto_100_default_20200917_0200 table root public hep_proto_100_default_20200917_0400 table root
public hep_proto_1_call table root
public hep_proto_1_call_20200917_0000 table root
public hep_proto_1_call_20200917_0200 table root
public hep_proto_1_call_20200917_0400 table root
public hep_proto_1_call_20200917_0600 table root
: public hep_proto_1_default table root public hep_proto_1_default_20200917_0000 table root public hep_proto_1_default_20200917_0200 table root public hep_proto_1_default_20200917_0400 table root public hep_proto_1_default_20200917_0600 table root public hep_proto_1_default_20200917_0800 table root
public hep_proto_1_registration table root
public hep_proto_1_registration_20200917_0000 table root
public hep_proto_1_registration_20200917_0200 table root
public hep_proto_1_registration_20200917_0400 table root
: public hep_proto_35_default table root public hep_proto_35_default_20200917_0000 table root public hep_proto_35_default_20200917_0600 table root public hep_proto_35_default_20200917_1200 table root
public hep_proto_54_default table root
public hep_proto_54_default_20200917_0000 table root
public hep_proto_54_default_20200917_0600 table root
public hep_proto_54_default_20200917_1200 table root
: public hep_proto_5_default table root public hep_proto_5_default_20200917_0000 table root public hep_proto_5_default_20200917_0600 table root public hep_proto_5_default_20200917_1200 table root

What does the different hep_prot_x values means?

solarmon commented 4 years ago

Our root partition is 16G in size and the homer_data database is 11G in size:

homer_data=# SELECT pg_database.datname, pg_size_pretty(pg_database_size(pg_database.datname)) AS size FROM pg_database; datname | size --------------+--------- postgres | 7577 kB root | 7717 kB template1 | 7577 kB template0 | 7577 kB homer_config | 8257 kB homer_data | 11 GB (6 rows)

I've checked the storage again and it has got full again - after I cleared up only several hundred megs of storage to get it accessible again. This on a setup with hardly any SIP traffic running through it at the moment.

solarmon commented 4 years ago

Please advise whether I can just delete the tables in the homer_data database.

lmangani commented 4 years ago

Sure, that's what the rotation does as well. Did you see any suspect logs on the postgres node?

solarmon commented 4 years ago

I can't use psql now - since there is no more storage... :'(

lmangani commented 4 years ago

I suppose the container maps to a local folder? If so - You should be able to drop data manually and restart to regain control

solarmon commented 4 years ago

I managed to clear some storage - deleted the old kernels - and got access with psql again.

I deleted the tables:

hep_proto_1_default_20200917 hep_proto_1_default_20200918

Do I need to delete any other related/correspond tables? I.e. any other hep_proto_x tables on those dates?

This has given me 2G of storage so I should be OK for a while.

solarmon commented 4 years ago

Hi,

I checked the server this morning and it seems something has happened and I now have 8.3G of free space.

However, looking at the heplify-server logs, I see the following such messages:

2020/09/30 03:44:22.532031 postgres.go:291: ERR pq: Could not complete operation in a failed transaction 2020/09/30 03:44:26.536029 postgres.go:283: ERR pq: could not extend file "base/24576/120736": No space left on device 2020/09/30 03:44:26.537880 postgres.go:291: ERR pq: Could not complete operation in a failed transaction 2020/09/30 03:44:30.531951 postgres.go:283: ERR pq: could not extend file "base/24576/120736": No space left on device 2020/09/30 03:44:30.536009 postgres.go:291: ERR pq: Could not complete operation in a failed transaction 2020/09/30 03:44:34.535546 postgres.go:283: ERR pq: could not extend file "base/24576/120736": No space left on device 2020/09/30 03:44:34.537660 postgres.go:291: ERR pq: Could not complete operation in a failed transaction 2020/09/30 03:44:38.532047 postgres.go:283: ERR pq: could not extend file "base/24576/120736": No space left on device 2020/09/30 03:44:38.536830 postgres.go:291: ERR pq: Could not complete operation in a failed transaction 2020/09/30 03:44:42.531374 postgres.go:283: ERR pq: could not extend file "base/24576/120736": No space left on device 2020/09/30 03:44:42.531555 postgres.go:291: ERR pq: Could not complete operation in a failed transaction 2020/09/30 03:44:46.534014 postgres.go:283: ERR pq: could not extend file "base/24576/120736": No space left on device 2020/09/30 03:44:46.534527 postgres.go:291: ERR pq: Could not complete operation in a failed transaction 2020/09/30 03:44:50.531989 postgres.go:283: ERR pq: could not extend file "base/24576/120736": No space left on device 2020/09/30 03:44:50.532223 postgres.go:291: ERR pq: Could not complete operation in a failed transaction 2020/09/30 03:44:54.537805 postgres.go:283: ERR pq: could not extend file "base/24576/120736": No space left on device 2020/09/30 03:44:54.539538 postgres.go:291: ERR pq: Could not complete operation in a failed transaction 2020/09/30 03:44:58.534055 postgres.go:283: ERR pq: could not extend file "base/24576/120736": No space left on device 2020/09/30 03:44:58.534536 postgres.go:291: ERR pq: Could not complete operation in a failed transaction 2020/09/30 03:45:00.000329 rotator.go:319: INFO run drop job 2020/09/30 03:45:00.213385 rotator.go:406: WARN pq: could not write init file 2020/09/30 03:45:00.219928 rotator.go:406: WARN pq: could not write init file 2020/09/30 03:45:00.225446 rotator.go:406: WARN pq: could not write init file 2020/09/30 03:45:00.230886 rotator.go:406: WARN pq: could not write init file 2020/09/30 03:45:00.235984 rotator.go:406: WARN pq: could not write init file 2020/09/30 03:45:00.241855 rotator.go:406: WARN pq: could not write init file 2020/09/30 03:45:00.246985 rotator.go:406: WARN pq: could not write init file 2020/09/30 03:45:00.252332 rotator.go:406: WARN pq: could not write init file 2020/09/30 03:45:00.257559 rotator.go:406: WARN pq: could not write init file 2020/09/30 03:45:00.262968 rotator.go:406: WARN pq: could not write init file 2020/09/30 03:45:00.267984 rotator.go:406: WARN pq: could not write init file 2020/09/30 03:45:00.272982 rotator.go:406: WARN pq: could not write init file 2020/09/30 03:45:00.278314 rotator.go:406: WARN pq: could not write init file 2020/09/30 03:45:00.283454 rotator.go:406: WARN pq: could not write init file 2020/09/30 03:45:00.288574 rotator.go:406: WARN pq: could not write init file 2020/09/30 03:45:00.293860 rotator.go:406: WARN pq: could not write init file 2020/09/30 03:45:00.298996 rotator.go:406: WARN pq: could not write init file 2020/09/30 03:45:00.304384 rotator.go:406: WARN pq: could not write init file 2020/09/30 03:45:00.309573 rotator.go:406: WARN pq: could not write init file 2020/09/30 03:45:00.314994 rotator.go:406: WARN pq: could not write init file 2020/09/30 03:45:00.319979 rotator.go:406: WARN pq: could not write init file 2020/09/30 03:45:00.324976 rotator.go:406: WARN pq: could not write init file 2020/09/30 03:45:00.330305 rotator.go:406: WARN pq: could not write init file 2020/09/30 03:45:00.335328 rotator.go:406: WARN pq: could not write init file 2020/09/30 03:45:00.340490 rotator.go:406: WARN pq: could not write init file 2020/09/30 03:45:00.345512 rotator.go:406: WARN pq: could not write init file 2020/09/30 03:45:05.252843 rotator.go:323: INFO finished drop job, next will run at 2020-10-01 03:45:05.252822203 +0000 UTC m=+140301.891340068 2020/09/30 03:46:43.373121 server.go:295: INFO stats since last 5 minutes. PPS: 29, HEP: 8896, Filtered: 0, Error: 0 2020/09/30 03:51:43.373071 server.go:295: INFO stats since last 5 minutes. PPS: 29, HEP: 8896, Filtered: 0, Error: 0 2020/09/30 03:56:43.373282 server.go:295: INFO stats since last 5 minutes. PPS: 29, HEP: 8910, Filtered: 0, Error: 0 2020/09/30 04:01:43.373119 server.go:295: INFO stats since last 5 minutes. PPS: 29, HEP: 8926, Filtered: 0, Error: 0 2020/09/30 04:06:43.373205 server.go:295: INFO stats since last 5 minutes. PPS: 29, HEP: 8904, Filtered: 0, Error: 0 2020/09/30 04:11:43.373167 server.go:295: INFO stats since last 5 minutes. PPS: 29, HEP: 8910, Filtered: 0, Error: 0 2020/09/30 04:16:43.373146 server.go:295: INFO stats since last 5 minutes. PPS: 29, HEP: 8922, Filtered: 0, Error: 0

Checking the docker processes, I see that alertmanager and prometheus were in 'Restarting' state, and restarting them doesn't change the status. Whether this is related to the above log messages, I don't know.

Please advise what those log ERR and WARN messages mean and how to resolve them.

Thank you.

solarmon commented 4 years ago

Looking at the contents of the homer_data tables, it seems most of it is filled with SIP OPTIONS. However, this traffic is not visible in the Homer web app.

I will look in to how to stop SIP OPTIONS from being sent from the HEP source to reduce the storage requirements. But how can I make SIP OPTIONS visible in the Homer web app? Having visibility of SIP OPTIONS can be useful in certain situations.

lmangani commented 4 years ago

SIP Options are of course visible, just in their own transaction. This has been the case since Homer 5 - use the form transaction type.

solarmon commented 4 years ago

Do you mean in the Search form/widget in the Homer web app GUI? I don't see a 'Transaction Type' search field to add to it. Only 'Proto Family', 'Protocol Type', 'Payload Type'.

But even with all the search fields empty - so everything should be included? - it does not show SIP OPTIONS in the Display Results.

Am I looking in the wrong place?

solarmon commented 4 years ago

In the 'SIP Method' search field - only the following methods appears in the drop down menu:

INVITE BYE 100 200 183 CANCEL

Even if I type in 'OPTIONS' manually in to it, it does not show any search results.

lmangani commented 4 years ago

HOMER splits messages by transaction type (call, registration, everything else) in different buckets to allow rotating them at different speeds - Most likely you are searching for out-of-band messages. Try using the SIP - Default protocol bucket.

image

Please confirm if this is your case so we can improve documentation - its all pretty fresh.

solarmon commented 4 years ago

Ah, thank you!

I now see where the 'Proto' field is in the search Settings window. It is currently on 'SIP - call' and I have now changed it to 'SIP - default' and the search SIP Methods and results now include SIP OPTIONS.

I was not aware that this 'Proto' option and that it limits what the results return. I now read it as for the 'SIP Method' search field, if nothing is selected then it will match all methods in that list, which can be different based on what is selected in the search Settings for 'Proto'.

bilalrao12 commented 5 months ago

HEPLIFYSERVER_DBROTATE=true HEPLIFYSERVER_DBDROPDAYS=5

DB rotation works fine using above ENVs, recently tested with (heplify-server version 1.59.6)