sipcapture / homer7-docker

HOMER 7 Docker Images
92 stars 86 forks source link

homer_data keeps disappearing after 1 day #99

Closed lukeescude closed 3 years ago

lukeescude commented 3 years ago

Hello! I believe one (or more) of the following properties is causing the homer_data database to disappear every day, in addition to homer_config being truncated (completely empty) as well:

  - "HEPLIFYSERVER_DBDROPDAYS=1"
  - "HEPLIFYSERVER_DBDROPDAYSCALL=1"
  - "HEPLIFYSERVER_DBDROPDAYSREGISTER=1"
  - "HEPLIFYSERVER_DBDROPDAYSDEFAULT=1"
  - "HEPLIFYSERVER_DBDROPONSTART=true"
  - "HEPLIFYSERVER_DBUSAGEPROTECTION=true"

I am going to start by disabling DBUsageProtection, then continue up the list to see which one might be causing it.

Symptoms: Cannot log in. Connecting to psql shows no homer_data database at all, and the homer_config db has no tables or data in it.

lmangani commented 3 years ago

@lukeescude this is intentional, as per the DROPDAYS parameter set to 1. Change the following from:

  - "HEPLIFYSERVER_DBDROPDAYS=1"
  - "HEPLIFYSERVER_DBDROPONSTART=true"

to whatever else you want

  - "HEPLIFYSERVER_DBDROPDAYS=5"
  - "HEPLIFYSERVER_DBDROPONSTART=false"
lukeescude commented 3 years ago

@lmangani is it supposed to prevent login? Like I mentioned, it’s completely erasing ALL data, including logins and config data. Heplify is no longer storing HEP packets anymore, and no users can log in.

this is normal?

lmangani commented 3 years ago

Have you tried using- "HEPLIFYSERVER_DBDROPONSTART=false"?

lukeescude commented 3 years ago

Not yet, I will test that soon.

lukeescude commented 3 years ago

I have removed all parameters except the following:

And will report back if it continues happening.

Every morning, the db container is unhealthy/constantly restarting.

lukeescude commented 3 years ago

Database keeps corrupting, seems to be at 1:30AM UTC:

2021-03-18 01:30:02.050 UTC [75] PANIC:  could not create file "pg_wal/xlogtemp.75": No such file or directory
2021-03-18 01:30:02.050 UTC [75] STATEMENT:  COMMIT
2021-03-18 01:30:02.226 UTC [1] LOG:  server process (PID 75) was terminated by signal 6: Aborted
2021-03-18 01:30:02.226 UTC [1] DETAIL:  Failed process was running: COMMIT
2021-03-18 01:30:02.226 UTC [1] LOG:  terminating any other active server processes
2021-03-18 01:30:02.227 UTC [30453] WARNING:  terminating connection because of crash of another server process
2021-03-18 01:30:02.227 UTC [30453] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2021-03-18 01:30:02.227 UTC [30453] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2021-03-18 01:30:02.227 UTC [78] WARNING:  terminating connection because of crash of another server process
2021-03-18 01:30:02.227 UTC [78] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2021-03-18 01:30:02.227 UTC [78] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2021-03-18 01:30:02.230 UTC [57] WARNING:  terminating connection because of crash of another server process
2021-03-18 01:30:02.230 UTC [57] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2021-03-18 01:30:02.230 UTC [57] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2021-03-18 01:30:02.230 UTC [58] LOG:  could not open temporary statistics file "pg_stat/global.tmp": No such file or directory
2021-03-18 01:30:02.237 UTC [76] WARNING:  terminating connection because of crash of another server process
2021-03-18 01:30:02.237 UTC [76] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2021-03-18 01:30:02.237 UTC [76] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2021-03-18 01:30:02.237 UTC [76] CONTEXT:  COPY hep_proto_1_default, line 312: "0_2874270021@10.1.3.48  2021-03-18T01:30:01.992859258Z  {"protocolFamily":2,"protocol":6,"srcIp":"71.7..."
2021-03-18 01:30:02.237 UTC [77] WARNING:  terminating connection because of crash of another server process
2021-03-18 01:30:02.237 UTC [77] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2021-03-18 01:30:02.237 UTC [77] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2021-03-18 01:30:02.237 UTC [77] CONTEXT:  COPY hep_proto_1_default, line 222: "0_742953667@10.10.40.14 2021-03-18T01:30:01.908915Z {"protocolFamily":2,"protocol":6,"srcIp":"66.134..."
2021-03-18 01:30:02.302 UTC [30858] FATAL:  the database system is in recovery mode
2021-03-18 01:30:02.302 UTC [30859] FATAL:  the database system is in recovery mode
2021-03-18 01:30:02.718 UTC [30860] FATAL:  the database system is in recovery mode
2021-03-18 01:30:02.790 UTC [30861] FATAL:  the database system is in recovery mode
2021-03-18 01:30:02.797 UTC [30862] FATAL:  the database system is in recovery mode
2021-03-18 01:30:03.099 UTC [30863] FATAL:  the database system is in recovery mode
2021-03-18 01:30:03.103 UTC [30864] FATAL:  the database system is in recovery mode
2021-03-18 01:30:03.104 UTC [30865] FATAL:  the database system is in recovery mode
2021-03-18 01:30:03.105 UTC [30866] FATAL:  the database system is in recovery mode
2021-03-18 01:30:03.106 UTC [30867] FATAL:  the database system is in recovery mode
2021-03-18 01:30:03.221 UTC [30868] FATAL:  the database system is in recovery mode
2021-03-18 01:30:03.472 UTC [30875] FATAL:  the database system is in recovery mode
2021-03-18 01:30:03.653 UTC [30876] FATAL:  the database system is in recovery mode
2021-03-18 01:30:03.673 UTC [30877] FATAL:  the database system is in recovery mode
2021-03-18 01:30:03.746 UTC [30878] FATAL:  the database system is in recovery mode
2021-03-18 01:30:03.837 UTC [1] LOG:  all server processes terminated; reinitializing
2021-03-18 01:30:03.857 UTC [1] PANIC:  could not open control file "global/pg_control": No such file or directory
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

initdb: directory "/var/lib/postgresql/data" exists but is not empty
If you want to create a new database system, either remove or empty
the directory "/var/lib/postgresql/data" or run initdb
with an argument other than "/var/lib/postgresql/data".
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

... etc.

lmangani commented 3 years ago

very clealy the issue definitely is the database - could you confirm if this was a fresh instance and/or what version is it?

2021-03-18 01:30:02.226 UTC [1] LOG:  server process (PID 75) was terminated by signal 6: Aborted
lukeescude commented 3 years ago

postgres:11-alpine (version 11.11) Image was created 5 days ago

lmangani commented 3 years ago

I would look into whatever happens at 1:30AM - it looks like a complete drop of the data directory, or perhaps a side effect for a change of permissions or something like that against the storage directory of the pg container.

lukeescude commented 3 years ago

Ah yes looks like a maintenance function removing the postgres-data folder without having stopped the containers first.

I think that's the culprit - Sorry to bother you, that was idiotic of me.

lmangani commented 3 years ago

No problem! its good to know you found the solution and we had no related bugs :)