Closed n8twj closed 2 years ago
Have you checked that when docker starts the container, the first line in the docker logs that is written in heplify-server container? Does it include you settings correctly?
When I exec into heplify-server the help screen acts like it sees the values, but doesn't honor them
-dbusagescheme Change value of DBUsageScheme. (default percentage)
-dbpercentageusage Change value of DBPercentageUsage. (default 75%)
DBDropDaysRegister:1, DBDropDaysDefault:1, DBDropOnStart:true, DBUsageProtection:true, DBUsageScheme:"percentage", DBPercentageUsage:"50%", DBMaxSize:"20GB",
Even changed 75% to 50% in docker-compose.yml and change shows up on start
DBMaxSize also not getting honored DBDropOnStart also not getting honored
This is one hour chart, with one hour of calling starting at 0900 with only 27 calls per second [not even 1/4 of our traffic]
No Register, Notify, Subscribe, nor Options are getting sent to HEP
2022/10/03 13:58:37.449761 server.go:297: INFO stats since last 5 minutes. PPS: 1601, HEP: 480303, Filtered: 0, Error: 0 2022/10/03 14:08:37.449725 server.go:297: INFO stats since last 5 minutes. PPS: 1563, HEP: 468991, Filtered: 0, Error: 0
are you sure its not logs or something else filling up disks? the math doesn't add up
this is a dedicated instance only running hom7-prom-all docker-compose example.
$ sudo du -sh * 4.0K alertmanager 4.0K bootstrap 8.0K docker-compose.yml 160K grafana 4.0K init-user-db.sh 8.0K loki 7.9G postgres-data 8.0K prometheus 4.0K README.md
In an hour postgres-data will be double that size
$ sudo du -sh * 4.0K alertmanager 4.0K bootstrap 8.0K docker-compose.yml 160K grafana 4.0K init-user-db.sh 8.0K loki 8.6G postgres-data 8.0K prometheus 4.0K README.md [ec2-user@ip-172-31-42-201 hom7-prom-all]$ date Mon Oct 3 15:29:52 UTC 2022
(Not even 60 seconds later)
30 CPS
[ec2-user@ip-172-31-42-201 hom7-prom-all]$ sudo du -sh 4.0K alertmanager 4.0K bootstrap 8.0K docker-compose.yml 160K grafana 4.0K init-user-db.sh 8.0K loki 9.1G postgres-data 8.0K prometheus 4.0K README.md [ec2-user@ip-172-31-42-201 hom7-prom-all]$ date Mon Oct 3 15:31:46 UTC 2022 [ec2-user@ip-172-31-42-201 hom7-prom-all]$ sudo du -sh 4.0K alertmanager 4.0K bootstrap 8.0K docker-compose.yml 160K grafana 4.0K init-user-db.sh 8.0K loki 11G postgres-data 8.0K prometheus 4.0K README.md [ec2-user@ip-172-31-42-201 hom7-prom-all]$ date Mon Oct 3 15:36:24 UTC 2022
In doubt, try adding this to the postgres container in your compose and see if it nukes any size down.
logging:
options:
max-size: "10m"
so logging by default knows how to access postgres? ...without feeding it any credentials
There is already a max logging on every service
logging:
options:
max-size: "50m"
I asked someone else with significantly more Postgress experience than myself to take a look. He said under no circumstances should Postgres run as "root" and use the "root" database user. The moment we switched away from db user "root" the hard drive space usage instantly stopped being burnt up.
- "HEPLIFYSERVER_DBUSER=db"
POSTGRES_USER: db
test: ["CMD-SHELL", "psql -h 'localhost' -U 'db' -c '\\l'"]
While the postgres issue is important to change... Another issue was discovered that somehow the depends_on db requirement wasn't being honored.
once we restart heplify-server after the rest of the stack is up, this issue returns (and homer-webapp works as expected)
The postgres container used in this recipe is the official one from postgres. The code you're referring to is just a healthcheck, and does not imply postgres is being ran as root. I guess this means with your changes, postgres was never being started and this was the root cause of the other issues?
But in the official configuration, they do not run the "root" db user - There is even a big red warning if you attempt to do so in the postgres cli
We are now back to burning up 25 GIB of hard drive space in about 2 hours with a maximum of 18 CPS.
[ec2-user@ip-172-31-42-201 hom7-prom-all]$ docker-compose up -d
Creating network "hom7-prom-all_default" with the default driver
Creating grafana ... done
Creating prometheus ... done
Creating nodeexporter ... done
Creating alertmanager ... done
Creating db ... done
Creating heplify-server ... done
Creating homer-webapp ... done
[ec2-user@ip-172-31-42-201 hom7-prom-all]$ docker logs heplify-server
Could not find toml config file, use flag defaults. <nil>
2022/10/05 16:36:26.662120 server.go:93: INFO start heplify-server 1.57 with config.HeplifyServer{HEPAddr:"0.0.0.0:9060", HEPTCPAddr:"0.0.0.0:9061", HEPTLSAddr:"", HEPWSAddr:"", ESAddr:"", ESDiscovery:true, HEPv2Enable:true, ESUser:"", ESPass:"", LokiURL:"", LokiBulk:400, LokiTimer:4, LokiBuffer:100000, LokiHEPFilter:[]int{1, 5, 100}, ForceHEPPayload:[]int(nil), PromAddr:"0.0.0.0:9096", PromTargetIP:"", PromTargetName:"", DBShema:"homer7", DBDriver:"postgres", DBAddr:"db:5432", DBSSLMode:"disable", DBUser:"db", DBPass:"<private>", DBDataTable:"homer_data", DBConfTable:"homer_config", DBBulk:400, DBTimer:4, DBBuffer:400000, DBWorker:8, DBRotate:true, DBPartLog:"2h", DBPartIsup:"6h", DBPartSip:"2h", DBPartQos:"6h", DBDropDays:1, DBDropDaysCall:1, DBDropDaysRegister:1, DBDropDaysDefault:1, DBDropOnStart:true, DBUsageProtection:false, DBUsageScheme:"percentage", DBPercentageUsage:"50%", DBMaxSize:"20GB", DBProcDropLimit:2, Dedup:false, DiscardMethod:[]string(nil), CensorMethod:[]string(nil), AlegIDs:[]string(nil), ForceALegID:false, CustomHeader:[]string(nil), IgnoreCaseCH:false, SIPHeader:[]string{"ruri_user", "ruri_domain", "from_user", "from_tag", "to_user", "callid", "cseq", "method", "user_agent"}, LogDbg:"", LogLvl:"info", LogStd:true, LogSys:false, Config:"./heplify-server.toml", ConfigHTTPAddr:"", ConfigHTTPPW:"", Version:false, ScriptEnable:false, ScriptEngine:"lua", ScriptFolder:"", ScriptHEPFilter:[]int{1, 5, 100}, TLSCertFolder:".", TLSMinVersion:"1.2"}
2022/10/05 16:36:26.662595 prometheus.go:38: INFO expose metrics without or unbalanced targets
2022/10/05 16:36:26.667676 rotator.go:539: INFO start creating tables (2022-10-05 16:36:26.667640402 +0000 UTC m=+0.074928689)
2022/10/05 16:36:26.670003 rotator.go:544: ERR dial tcp 172.18.0.6:5432: connect: connection refused
2022/10/05 16:36:26.674425 rotator.go:547: ERR dial tcp 172.18.0.6:5432: connect: connection refused
2022/10/05 16:36:26.675225 rotator.go:550: ERR dial tcp 172.18.0.6:5432: connect: connection refused
2022/10/05 16:36:26.675356 rotator.go:552: INFO end creating tables (2022-10-05 16:36:26.675335751 +0000 UTC m=+0.082623925)
2022/10/05 16:36:26.677122 rotator.go:555: ERR dial tcp 172.18.0.6:5432: connect: connection refused
2022/10/05 16:36:26.677713 server.go:152: ERR dial tcp 172.18.0.6:5432: connect: connection refused
$ docker logs db
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
Data page checksums are disabled.
fixing permissions on existing directory /var/lib/postgresql/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default timezone ... UTC
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
sh: locale: not found
2022-10-05 16:36:26.288 UTC [31] WARNING: no usable system locales were found
performing post-bootstrap initialization ... ok
WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
syncing data to disk ... ok
Success. You can now start the database server using:
pg_ctl -D /var/lib/postgresql/data -l logfile start
waiting for server to start....2022-10-05 16:36:26.999 UTC [42] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2022-10-05 16:36:27.017 UTC [43] LOG: database system was shut down at 2022-10-05 16:36:26 UTC
2022-10-05 16:36:27.020 UTC [42] LOG: database system is ready to accept connections
done
server started
CREATE DATABASE
/usr/local/bin/docker-entrypoint.sh: sourcing /docker-entrypoint-initdb.d/init-user-db.sh
CREATE DATABASE
waiting for server to shut down....2022-10-05 16:36:27.594 UTC [42] LOG: received fast shutdown request
2022-10-05 16:36:27.595 UTC [42] LOG: aborting any active transactions
2022-10-05 16:36:27.596 UTC [42] LOG: background worker "logical replication launcher" (PID 49) exited with exit code 1
2022-10-05 16:36:27.597 UTC [44] LOG: shutting down
2022-10-05 16:36:27.607 UTC [42] LOG: database system is shut down
done
server stopped
PostgreSQL init process complete; ready for start up.
2022-10-05 16:36:27.706 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
2022-10-05 16:36:27.706 UTC [1] LOG: listening on IPv6 address "::", port 5432
2022-10-05 16:36:27.709 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2022-10-05 16:36:27.724 UTC [58] LOG: database system was shut down at 2022-10-05 16:36:27 UTC
2022-10-05 16:36:27.727 UTC [1] LOG: database system is ready to accept connections
2022-10-05 16:36:29.218 UTC [78] ERROR: database "homer_config" already exists
2022-10-05 16:36:29.218 UTC [78] STATEMENT: CREATE DATABASE homer_config OWNER db
The heplify-server can't connect the postgres container - this is common when things go wrong inside a container and/or when healthchecks are not working as intended - or both. Perhaps you shouldn't use docker and rather install the real components on your system for full control?
I then immedately do:
$ docker restart heplify-server
Could not find toml config file, use flag defaults. <nil>
2022/10/05 16:39:44.048884 server.go:93: INFO start heplify-server 1.57 with config.HeplifyServer{HEPAddr:"0.0.0.0:9060", HEPTCPAddr:"0.0.0.0:9061", HEPTLSAddr:"", HEPWSAddr:"", ESAddr:"", ESDiscovery:true, HEPv2Enable:true, ESUser:"", ESPass:"", LokiURL:"", LokiBulk:400, LokiTimer:4, LokiBuffer:100000, LokiHEPFilter:[]int{1, 5, 100}, ForceHEPPayload:[]int(nil), PromAddr:"0.0.0.0:9096", PromTargetIP:"", PromTargetName:"", DBShema:"homer7", DBDriver:"postgres", DBAddr:"db:5432", DBSSLMode:"disable", DBUser:"db", DBPass:"<private>", DBDataTable:"homer_data", DBConfTable:"homer_config", DBBulk:400, DBTimer:4, DBBuffer:400000, DBWorker:8, DBRotate:true, DBPartLog:"2h", DBPartIsup:"6h", DBPartSip:"2h", DBPartQos:"6h", DBDropDays:1, DBDropDaysCall:1, DBDropDaysRegister:1, DBDropDaysDefault:1, DBDropOnStart:true, DBUsageProtection:false, DBUsageScheme:"percentage", DBPercentageUsage:"50%", DBMaxSize:"20GB", DBProcDropLimit:2, Dedup:false, DiscardMethod:[]string(nil), CensorMethod:[]string(nil), AlegIDs:[]string(nil), ForceALegID:false, CustomHeader:[]string(nil), IgnoreCaseCH:false, SIPHeader:[]string{"ruri_user", "ruri_domain", "from_user", "from_tag", "to_user", "callid", "cseq", "method", "user_agent"}, LogDbg:"", LogLvl:"info", LogStd:true, LogSys:false, Config:"./heplify-server.toml", ConfigHTTPAddr:"", ConfigHTTPPW:"", Version:false, ScriptEnable:false, ScriptEngine:"lua", ScriptFolder:"", ScriptHEPFilter:[]int{1, 5, 100}, TLSCertFolder:".", TLSMinVersion:"1.2"}
2022/10/05 16:39:44.050488 prometheus.go:38: INFO expose metrics without or unbalanced targets
2022/10/05 16:39:44.050575 rotator.go:539: INFO start creating tables (2022-10-05 16:39:44.050561264 +0000 UTC m=+0.013321060)
2022/10/05 16:39:50.271276 rotator.go:552: INFO end creating tables (2022-10-05 16:39:50.271254045 +0000 UTC m=+6.234013863)
2022/10/05 16:39:50.505989 postgres.go:61: INFO postgres connection established
which then makes homer-webapp happy again [ie we can search for traces and view them].... but also brings back the issue of the database not getting rotated
That's what the healthcheck
is for - it holds other containers until the DB is ready. Have you removed it by any chance?
I presume you mean this:
healthcheck:
test: ["CMD-SHELL", "psql -h 'localhost' -U 'db' -c '\\l'"]
interval: 1s
timeout: 3s
retries: 30
Which seems to state it is healthy:
2a45c8bbb6d5 postgres:11-alpine "docker-entrypoint.s…" 8 minutes ago Up 8 minutes (healthy) 5432/tcp db
very simply issuing a docker restart heplify-server AFTER docker-compose up -d fixes the problem...thus depends_on db check is not working as intended.
...which brings us back to the lecture at hand... homer-webapp works again and hard drive space isn't being released by heplify-server app
Our only suggestion would be to replace the Docker postgres container with an actual Postgres on the host, and to investigate the space usage on the instance hands-on. Once we know what the issue is, we can most definitely fix it.
Google tells me I am not the only person with this issue
Below are all logs from heplify-server. If you notice "INFO start creating tables" is called, so we are in the correct bit of code... however this code never gets called...
Could not find toml config file, use flag defaults. <nil>
2022/10/17 21:23:24.914792 server.go:93: INFO start heplify-server 1.57 with config.HeplifyServer{HEPAddr:"0.0.0.0:9060", HEPTCPAddr:"0.0.0.0:9061", HEPTLSAddr:"", HEPWSAddr:"", ESAddr:"", ESDiscovery:true, HEPv2Enable:true, ESUser:"", ESPass:"", LokiURL:"", LokiBulk:400, LokiTimer:4, LokiBuffer:100000, LokiHEPFilter:[]int{1, 5, 100}, ForceHEPPayload:[]int(nil), PromAddr:"0.0.0.0:9096", PromTargetIP:"", PromTargetName:"", DBShema:"homer7", DBDriver:"postgres", DBAddr:"db:5432", DBSSLMode:"disable", DBUser:"db", DBPass:"<private>", DBDataTable:"homer_data", DBConfTable:"homer_config", DBBulk:400, DBTimer:4, DBBuffer:400000, DBWorker:10, DBRotate:true, DBPartLog:"2h", DBPartIsup:"6h", DBPartSip:"2h", DBPartQos:"6h", DBDropDays:1, DBDropDaysCall:1, DBDropDaysRegister:1, DBDropDaysDefault:1, DBDropOnStart:true, DBUsageProtection:false, DBUsageScheme:"percentage", DBPercentageUsage:"50%", DBMaxSize:"20GB", DBProcDropLimit:2, Dedup:false, DiscardMethod:[]string(nil), CensorMethod:[]string(nil), AlegIDs:[]string(nil), ForceALegID:false, CustomHeader:[]string(nil), IgnoreCaseCH:false, SIPHeader:[]string{"ruri_user", "ruri_domain", "from_user", "from_tag", "to_user", "callid", "cseq", "method", "user_agent"}, LogDbg:"", LogLvl:"info", LogStd:true, LogSys:false, Config:"./heplify-server.toml", ConfigHTTPAddr:"", ConfigHTTPPW:"", Version:false, ScriptEnable:false, ScriptEngine:"lua", ScriptFolder:"", ScriptHEPFilter:[]int{1, 5, 100}, TLSCertFolder:".", TLSMinVersion:"1.2"}
2022/10/17 21:23:24.915263 prometheus.go:38: INFO expose metrics without or unbalanced targets
2022/10/17 21:23:24.916112 rotator.go:539: INFO start creating tables (2022-10-17 21:23:24.915489474 +0000 UTC m=+0.017967068)
2022/10/17 21:23:34.717995 rotator.go:552: INFO end creating tables (2022-10-17 21:23:34.717966654 +0000 UTC m=+9.820444224)
2022/10/17 21:23:35.113477 postgres.go:61: INFO postgres connection established
2022/10/17 21:28:24.915553 server.go:297: INFO stats since last 5 minutes. PPS: 439, HEP: 131735, Filtered: 0, Error: 0
2022/10/17 21:33:24.915660 server.go:297: INFO stats since last 5 minutes. PPS: 1278, HEP: 383609, Filtered: 0, Error: 0
2022/10/17 21:38:24.915470 server.go:297: INFO stats since last 5 minutes. PPS: 1034, HEP: 310394, Filtered: 0, Error: 0
2022/10/17 21:43:24.915481 server.go:297: INFO stats since last 5 minutes. PPS: 501, HEP: 150485, Filtered: 0, Error: 0
not even at 45 04 *
plus - why would such a job only run once a day?
2022/10/18 03:46:24.490337 server.go:297: INFO stats since last 5 minutes. PPS: 0, HEP: 1, Filtered: 0, Error: 2
2022/10/18 03:51:24.490257 server.go:297: INFO stats since last 5 minutes. PPS: 0, HEP: 0, Filtered: 0, Error: 0
2022/10/18 03:56:24.490278 server.go:297: INFO stats since last 5 minutes. PPS: 0, HEP: 0, Filtered: 0, Error: 0
2022/10/18 04:01:24.490243 server.go:297: INFO stats since last 5 minutes. PPS: 0, HEP: 0, Filtered: 0, Error: 0
2022/10/18 04:06:24.491181 server.go:297: INFO stats since last 5 minutes. PPS: 0, HEP: 0, Filtered: 0, Error: 0
2022/10/18 04:11:24.490237 server.go:297: INFO stats since last 5 minutes. PPS: 0, HEP: 0, Filtered: 0, Error: 0
2022/10/18 04:16:24.490274 server.go:297: INFO stats since last 5 minutes. PPS: 0, HEP: 0, Filtered: 0, Error: 0
2022/10/18 04:21:24.490183 server.go:297: INFO stats since last 5 minutes. PPS: 0, HEP: 0, Filtered: 0, Error: 0
2022/10/18 04:26:24.490197 server.go:297: INFO stats since last 5 minutes. PPS: 0, HEP: 0, Filtered: 0, Error: 0
2022/10/18 04:31:24.490215 server.go:297: INFO stats since last 5 minutes. PPS: 0, HEP: 0, Filtered: 0, Error: 0
2022/10/18 04:36:24.493830 server.go:297: INFO stats since last 5 minutes. PPS: 0, HEP: 0, Filtered: 0, Error: 0
2022/10/18 04:41:24.490255 server.go:297: INFO stats since last 5 minutes. PPS: 0, HEP: 0, Filtered: 0, Error: 0
2022/10/18 04:46:24.490275 server.go:297: INFO stats since last 5 minutes. PPS: 0, HEP: 0, Filtered: 0, Error: 0
2022/10/18 04:51:24.490165 server.go:297: INFO stats since last 5 minutes. PPS: 0, HEP: 0, Filtered: 0, Error: 0
2022/10/18 04:56:24.490283 server.go:297: INFO stats since last 5 minutes. PPS: 0, HEP: 0, Filtered: 0, Error: 0
I am running hom7-prom-all example docker-config from github. none of the database rotation environment variables are working
I have tried:
and
The database does not honor either setting
When I exec into heplify-server the help screen acts like it sees the values, but doesn't honor them
-dbusagescheme Change value of DBUsageScheme. (default percentage)
-dbpercentageusage Change value of DBPercentageUsage. (default 75%)
We are ONLY sending INVITE transactions to HEP via kamailio 5.6 siptrace module