sipcapture / heplify-server

HEP Capture Server for HOMER
https://sipcapture.org
GNU Affero General Public License v3.0
184 stars 85 forks source link

Recurring postgres partition errors #358

Closed ijozic closed 4 years ago

ijozic commented 4 years ago

Hello.

I've deployed homer 7 on the kubernetes and get the following error every few days (similar to issue #295, but I'm using Postgres):

2020-04-22T07:27:30.676371612Z 2020/04/22 07:27:30.676144 postgres.go:283: ERR pq: no partition of relation "hep_proto_1_call" found for row 2020-04-22T07:27:30.677051563Z 2020/04/22 07:27:30.676936 postgres.go:291: ERR pq: Could not complete operation in a failed transaction

At the same time, the logs for the webapp are showing errors as well:

2020-04-20T13:18:07.696908893Z {"level":"error","msg":"GetTransactionData: We have got error: pq: relation \"hep_proto_100_call\" does not exist","time":"2020-04-20T13:18:07Z"} 2020-04-20T13:18:07.698871754Z {"level":"error","msg":"GetTransactionData: We have got error: pq: relation \"hep_proto_5_call\" does not exist","time":"2020-04-20T13:18:07Z"}

Restarting the homer pod makes everything work again, but the error appears again after a day or two. The database is external, but the problem was appearing when the database was within the same deployment, as well. Sometimes it works for 3-4 days before appearing again.

I've checked the database and the partitions for that date are all there (i.e. starting with hep_proto_1_call_20200420_0000 up to hep_proto_1_call_20200420_2200) which makes me suspect the issue is not in missing partitions in the database, but in incorrect partition name determination during insert operations for received HEP events.

I also have some older homer 7 deployment (deployed for a bit more than a month or so) which doesn't show such issues. Both deployment files are pulling the latest image from docker hub, but the older one is separated over a few deployments as well.

Any insight into what's causing this would be appreciated, thanks.

Thanks.

negbie commented 4 years ago

please do a docker-compose pull and docker-compose up -d

This should bring up the latest version which should fix this.

ijozic commented 4 years ago

So it's a known issue and was corrected? With which date? Thanks.

negbie commented 4 years ago

With this commit https://github.com/sipcapture/heplify-server/commit/f7ee3543096440aaa08796cb5a2e156eed6bd5e8 2 days ago.

ijozic commented 4 years ago

Great, thanks. I'll report back if I see it occur again.

negbie commented 4 years ago

Cool, let's close this issue.