Open remeq opened 12 months ago
cc @yogendra
Would you be able to share the values file used for this? I see a couple of issues in the tserver-1.
First, I can see problem in creating socket
2023-09-14 08:52:04.579 UTC [631] FATAL: lock file "/tmp/.yb.0.0.0.0:5433/.s.PGSQL.5433.lock" already exists
Second, I see that there is redis related logs. Are you using / have enabled YEDIS?
I0914 08:52:04.627539 100 client_master_rpc.cc:76] 0x00005617b28261d8 -> GetTableSchemaRpc(table_identifier: table_name: "redis" namespace { name: "system_redis" database_type: YQL_DATABASE_REDIS }, num_attempts: 1): Failed, got resp error: Not found (yb/master/catalog_manager.cc:5265): Keyspace name not found: system_redis (master error 11)
Here is the values file I use to deploy YB via helm:
storage:
master:
count: 2
size: 10Gi
storageClass: gp2
tserver:
count: 2
size: 10Gi
storageClass: gp2
resource:
master:
requests:
cpu: 2
memory: 8Gi
limits:
cpu: 2
memory: 8Gi
tserver:
requests:
cpu: 2
memory: 8Gi
limits:
cpu: 2
memory: 8Gi
replicas:
master: 3
tserver: 3
partition:
master: 3
tserver: 3
nodeSelector:
onramp2: bignode
tserver:
tolerations:
- key: dedicated
operator: Equal
value: experimental
effect: NoSchedule
master:
tolerations:
- key: dedicated
operator: Equal
value: experimental
effect: NoSchedule
I haven't configured anything related to redis (and don't know that redis is being used under the hood).
Only thing odd in this is the partition values. Value 3
means that in case of a upgrade, pod number 4 and later will be upgraded in the statefulset.
I am assuming:
Is the nodePoll multi-AZ? There might be a chance that on restarting/rescheduling of pod, if may come up on a different zone? Like tserver-0 is on node on AZ1 and on restarting it may come up on AZ2 or AZ2? That may cause PV mounting issue also
It is a fully multi-az setup. Currently every replica of the tserver is in its own AZ, but I think the PVC are also created accordingly in respective AZ. Moreover - the YB after fresh installation is running without issues and we are able to create data in the DB. Only after few days it starts crashing like that.
I am trying to check on the error message (long lease time). Is there any way to find out why the first restart happened? Was the host drained? Or any out of memory or anything as such?
No, I dont know why the first restart happened. I don't think it is because of nodes, they all have uptime over 15 days and last time when the YB broke was yesterday. When the YB breaks again, I can provide access to our cluster.
In case of multi-az deploy, I would recommend using the using multi-az procedure. https://docs.yugabyte.com/preview/deploy/kubernetes/multi-zone/eks/helm-chart/
I am suspecting that pod rescheduling due to restart/crash might ber causing AZ and volume mismatch. Initial deployment would result in correct mapping due to provisioning of PV.
Thanks, umm I somehow missed this doc. I will redeploy according to multi-az procedure. It could indeed be the culprit. Thanks again, I will report back
@yogendra So I have completely redeployed our Yugabyte according to the multiAZ doc. Unfortunately after about 2 hours of running, the YB broke again. My impression is that maybe its not exactly the same type of crash like the one in the original post. What happened is that the currently elected master went down for some reason, and there were about 9 restarts of the container within an hour. Also (not related to those 9 restarts of the pod) - from the attached log its apparent that the YB container has problem restarting the system inside because some lock file hasn't probably been properly cleaned form the previous run.
2023-09-24 15:22:01.771 UTC [16896] LOG: YugaByte is ENABLED in PostgreSQL. Transactions are enabled.
2023-09-24 15:22:01.777 UTC [16896] LOG: pgaudit extension initialized
2023-09-24 15:22:01.778 UTC [16896] LOG: listening on IPv4 address "0.0.0.0", port 5433
2023-09-24 15:22:01.779 UTC [16896] FATAL: lock file "/tmp/.yb.0.0.0.0:5433/.s.PGSQL.5433.lock" already exists
2023-09-24 15:22:01.779 UTC [16896] HINT: Is another postmaster (PID 68) using socket file "/tmp/.yb.0.0.0.0:5433/.s.PGSQL.5433"?
2023-09-24 15:22:01.779 UTC [16896] LOG: database system is shut down
The inner YB software tries to run the db server in loop so the log is full of those attempts - those attempts are however within the same "pod/container run". I don't know what exactly caused the pod itself to be restarted 9 times (after those 9 times it kept running but the software inside was stuck in the loop from the attached log).
A kinda good news is that this kind of crash seems to be "softer". I was able get YB back online simply by deleting the pod and lettting the statefulset recreate it completely new. In the original crash it was much worse (deleting the pod didnt help). yb-tserver-0.log
Hi @remeq what we need to understand is why the postgres process crashing ?
We need to get the full logs from the start to the crash if possible, we need to take a look to the postgres log which should be in the container as well.
What may happen in your case, is the process is restarted / crash and the lock not being removed properly.
Tserver try to restart the postgres process but goes into loop because of the lock file.
Hi @shinji62 , today it crashed again. There were 29 restarts of tserver pod and after that it stabilized itself: here are the logs: https://yblogs123.s3.eu-west-1.amazonaws.com/logs-tserver0-az1.tar.gz
I can see a bunch os Scheduler is shutting down (system error 108)
leading upto system shutdown. Can I check what was the activity ongoing when server got shutdown? Were you running some performance test? Or high concurrency job? Like >100 concurrent connections?
@yogendra chmmm I haven't ran anything specific. The database is used by the Temporal software as a backend db (besides our own bunch of microservices) so if that software did something I am not sure.
Now it broke again:
@yogendra Here is also a gist which contains full describe of the yb-master-0
pod and also a full describe of node which this pod is running on, should somehow resources would be the problem.
https://gist.github.com/remeq/5afe5a530b4fa433e060ee430ce7ebc5
@yogendra actually we do see this 108 error all the time, even after we freshly recreate the pods and the DB appears to be functional, so the question is does this error indicate a real problem that should be investigated ?
I0928 14:43:21.864732 3323 poller.cc:66] Poll stopped: Service unavailable (yb/rpc/scheduler.cc:80): Scheduler is shutting down (system error 108)
@yogendra I think your question about number of concurrent connections could be a lead in right direction. I counted a number of active connections in one of our AZ using following commands and a result is frightening:
kubectl exec -it yb-tserver-0 -n yb-eu-west-1a -- /bin/bash
[root@yb-tserver-0 cores]# netstat -an | grep 5433 | grep ESTABLISHED | wc -l
227
@remeq how was the cpu / memory at the time of crashing ? Could be one of the pod is overloaded. How do you connect to the db from your app ?
@shinji62 I dont have numbers for cpu/memory right now, but our current focus is fixing our application to not open so many connections to the DB server. I've read in the YB docs that the default limit is 300 connections per node and we were getting close to it.
Do I suppose correctly that should we need a lot of connections we can "scale" this by adding new nodes into YB cluster and spreading connections over the cluster ?
Once we will bring the number of connections down I will report back if it improved the situation.
@remeq I think in our Saas we normally put 15 connections by vpcu/node, of course this is quite conservative but that works for a large range of workload.
Not sure what driver you are using but you should have a pooling mechanism to reuse connection.
Hmm, so it seems that there is some more issues with our YB. We fixed the number of connections, so currently there is about 25 connections but the YB is still crashing. Currently I am looking at two issues:
1) In the postgre logs I keep seeing the error 108 about the scheduler:
I0928 15:04:02.204464 17087 pg_client.cc:155] Using TServer host_port: yb-tserver-0.yb-tservers.yb-eu-west-1a.svc.cluster.local:9100
I0928 15:04:02.206054 17087 pg_client.cc:162] Session id 1338: Session id acquired. Postgres backend pid: 17087
I0928 15:04:02.248798 17097 mem_tracker.cc:419] Creating root MemTracker with garbage collection threshold 5242880 bytes
I0928 15:04:02.249640 17097 thread_pool.cc:167] Starting thread pool { name: pggate_ybclient max_workers: 1024 }
I0928 15:04:02.250339 17097 pg_client.cc:155] Using TServer host_port: yb-tserver-0.yb-tservers.yb-eu-west-1a.svc.cluster.local:9100
I0928 15:04:02.251868 17097 pg_client.cc:162] Session id 1339: Session id acquired. Postgres backend pid: 17097
I0928 15:04:08.764662 4055 poller.cc:66] Poll stopped: Service unavailable (yb/rpc/scheduler.cc:80): Scheduler is shutting down (system error 108)
I0928 15:04:15.566011 17114 mem_tracker.cc:419] Creating root MemTracker with garbage collection threshold 5242880 bytes
I0928 15:04:15.566864 17114 thread_pool.cc:167] Starting thread pool { name: pggate_ybclient max_workers: 1024 }
I0928 15:04:15.567566 17114 pg_client.cc:155] Using TServer host_port: yb-tserver-0.yb-tservers.yb-eu-west-1a.svc.cluster.local:9100
I0928 15:04:15.569221 17114 pg_client.cc:162] Session id 1340: Session id acquired. Postgres backend pid: 17114
I0928 15:04:28.478423 17141 mem_tracker.cc:419] Creating root MemTracker with garbage collection threshold 5242880 bytes
I0928 15:04:28.479322 17141 thread_pool.cc:167] Starting thread pool { name: pggate_ybclient max_workers: 1024 }
I0928 15:04:28.480062 17141 pg_client.cc:155] Using TServer host_port: yb-tserver-0.yb-tservers.yb-eu-west-1a.svc.cluster.local:9100
I0928 15:04:28.481685 17141 pg_client.cc:162] Session id 1341: Session id acquired. Postgres backend pid: 17141
I0928 15:04:28.517724 17151 mem_tracker.cc:419] Creating root MemTracker with garbage collection threshold 5242880 bytes
This error occurs even when I freshly recycle the YB (delete YB pods).
2) pods crash after a while with following:
I1003 11:22:40.407141 67 client_master_rpc.cc:76] 0x000055a60dc5e718 -> GetTableSchemaRpc(table_identifier: table_name: "redis" namespace { name: "system_redis" database_type: YQL_DATABASE_REDIS }, num_attempts: 1): Failed, got resp error: Not found (yb/master/catalog_manager.cc:5307): Keyspace name not found: system_redis (master error 11)
W1003 11:22:40.407539 67 client-internal.cc:1408] GetTableSchemaRpc(table_identifier: table_name: "redis" namespace { name: "system_redis" database_type: YQL_DATABASE_REDIS }, num_attempts: 1) failed: Not found (yb/master/catalog_manager.cc:5307): Keyspace name not found: system_redis (master error 11)
*** Aborted at 1696332160 (unix time) try "date -d @1696332160" if you are using GNU date ***
PC: @ 0x0 (unknown)
*** SIGSEGV (@0x10) received by PID 24 (TID 0x7f6051d0b700) from PID 16; stack trace: ***
@ 0x55a5e4190281 std::__1::__function::__func<>::operator()()
@ 0x55a5e41c5178 yb::redisserver::RedisServiceImpl::Impl::Handle()
@ 0x55a5e41cfa7e yb::redisserver::RedisServiceImpl::Handle()
@ 0x55a5e38eaf8a yb::rpc::ServicePoolImpl::Handle()
@ 0x55a5e382ad7f yb::rpc::InboundCall::InboundCallTask::Run()
@ 0x55a5e38f9b23 yb::rpc::(anonymous namespace)::Worker::Execute()
@ 0x55a5e3fcbb82 yb::Thread::SuperviseThread()
W1003 11:22:40.469089 68 transaction_status_cache.cc:133] Intent for transaction w/o metadata: d8cddb69-7582-4734-be64-f9635cc535fb
W1003 11:22:40.469184 68 transaction_status_cache.cc:133] Intent for transaction w/o metadata: 74b31e59-7d34-49d0-9e3c-d1cacaf2abe4
W1003 11:22:40.469202 68 transaction_status_cache.cc:133] Intent for transaction w/o metadata: e27db424-cfae-4a46-84cf-aa7c2d6848b7```
Can you grep on 4055
Based on following line, I am wondering that in the thread id 4055
somthing must have happened earlier to trigger a shutdown
I0928 15:04:08.764662 4055 poller.cc:66] Poll stopped: Service unavailable (yb/rpc/scheduler.cc:80): Scheduler is shutting down (system error 108)
@yogendra not sure if it has any significance. It seems that it is a pid of the process which is having the 108 error and it is restarted and fails regularly:
I1003 13:08:48.563877 946 pg_client.cc:162] Session id 26: Session id acquired. Postgres backend pid: 946
I1003 13:08:48.638608 950 poller.cc:66] Poll stopped: Service unavailable (yb/rpc/scheduler.cc:80): Scheduler is shutting down (system error 108)
I1003 13:08:49.157478 958 mem_tracker.cc:419] Creating root MemTracker with garbage collection threshold 5242880 bytes
I1003 13:08:49.158547 958 thread_pool.cc:167] Starting thread pool { name: pggate_ybclient max_workers: 1024 }
I1003 13:08:49.159335 958 pg_client.cc:155] Using TServer host_port: yb-tserver-0.yb-tservers.yb-eu-west-1a.svc.cluster.local:9100
I1003 13:08:49.161031 958 pg_client.cc:162] Session id 27: Session id acquired. Postgres backend pid: 958
I1003 13:08:49.240470 960 poller.cc:66] Poll stopped: Service unavailable (yb/rpc/scheduler.cc:80): Scheduler is shutting down (system error 108)
I1003 13:08:49.317142 969 mem_tracker.cc:419] Creating root MemTracker with garbage collection threshold 5242880 bytes
I1003 13:08:49.318174 969 thread_pool.cc:167] Starting thread pool { name: pggate_ybclient max_workers: 1024 }
I1003 13:08:49.318938 969 pg_client.cc:155] Using TServer host_port: yb-tserver-0.yb-tservers.yb-eu-west-1a.svc.cluster.local:9100
I1003 13:08:49.320544 969 pg_client.cc:162] Session id 28: Session id acquired. Postgres backend pid: 969
I1003 13:08:49.406906 971 poller.cc:66] Poll stopped: Service unavailable (yb/rpc/scheduler.cc:80): Scheduler is shutting down (system error 108)
I1003 13:08:49.438158 980 mem_tracker.cc:419] Creating root MemTracker with garbage collection threshold 5242880 bytes
I1003 13:08:49.439085 980 thread_pool.cc:167] Starting thread pool { name: pggate_ybclient max_workers: 1024 }
I1003 13:08:49.439265 981 mem_tracker.cc:419] Creating root MemTracker with garbage collection threshold 5242880 bytes
I1003 13:08:49.439823 980 pg_client.cc:155] Using TServer host_port: yb-tserver-0.yb-tservers.yb-eu-west-1a.svc.cluster.local:9100
Here you can see it happened 3 times in a short period of time.
Note also that I am getting this error in a pod which keeps running (is not crashed) so this particular error is definitely not an error that would be a result of the whole pod going down.
@yogendra @shinji62 - leaving the scheduler error 108 aside, what about the SIGSEGV crash which I also sent above ? Our YB pods keep crashing with that and it seems it is related to redis part of the YB. Should the redis functionality be somehow enabled/disabled/configured? When I list our databases there is no system_redis
table which YB is looking for:
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
---------------------+----------+----------+---------+-------------+-----------------------
postgres | postgres | UTF8 | C | en_US.UTF-8 |
system_platform | postgres | UTF8 | C | en_US.UTF-8 |
template0 | postgres | UTF8 | C | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | C | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
temporal | yugabyte | UTF8 | C | en_US.UTF-8 |
temporal_visibility | yugabyte | UTF8 | C | en_US.UTF-8 |
yugabyte | postgres | UTF8 | C | en_US.UTF-8 |
(7 rows)
At this point it would be nice to get a core dump, you can check here https://docs.yugabyte.com/preview/yugabyte-platform/troubleshoot/universe-issues/#collect-core-dumps-in-kubernetes-environments
@shinji62 here is the core dump:
https://yblogs123.s3.eu-west-1.amazonaws.com/core.60
and the SIGSEGV trace is:
*** SIGSEGV (@0x10) received by PID 24 (TID 0x7f6051d0b700) from PID 16; stack trace: ***
@ 0x55a5e4190281 std::__1::__function::__func<>::operator()()
@ 0x55a5e41c5178 yb::redisserver::RedisServiceImpl::Impl::Handle()
@ 0x55a5e41cfa7e yb::redisserver::RedisServiceImpl::Handle()
@ 0x55a5e38eaf8a yb::rpc::ServicePoolImpl::Handle()
@ 0x55a5e382ad7f yb::rpc::InboundCall::InboundCallTask::Run()
@ 0x55a5e38f9b23 yb::rpc::(anonymous namespace)::Worker::Execute()
@ 0x55a5e3fcbb82 yb::Thread::SuperviseThread()
@shinji62
I am now trying to configure redis using following doc, as it doesnt seem that helm chart iself supports anything related to the redis config:
https://docs.yugabyte.com/preview/yedis/quick-start/#kubernetes
I've executed commmand
/home/yugabyte/bin/yb-admin --master_addresses yb-master-0.yb-masters.yb-eu-west-1a.svc.cluster.local:7100,yb-master-0.yb-masters.yb-eu-west-1b.svc.cluster.local:7100,yb-master-0.yb-masters.yb-eu-west-1c.svc.cluster.local:7100 setup_redis_table
Which according to the doc should be enough to setup the redis but using the verification command from the same doc I see the redis is not available:
kubectl exec -it yb-tserver-0 -n yb-eu-west-1a -- /home/yugabyte/bin/redis-cli
Defaulted container "yb-tserver" out of: yb-tserver, yb-cleanup
Could not connect to Redis at 127.0.0.1:6379: Connection refused
Could not connect to Redis at 127.0.0.1:6379: Connection refused
not connected>
I am now trying to configure redis using following doc
Please do not use yedis because it's Deprecated https://docs.yugabyte.com/preview/yedis/
@ddorian no its not like that. I dont want to use Redis (YEDIS) but I needed to configure it because apparently without the redis configuration the normal Yugabyte (YSQL/YCQL) keeps crashing. Since yesterday when I executed the above command the YB hasn't crashed yet.....
@remeq Would it be possible for you to join our community slack and ping me? we could converse quickly and might be able to get on a zoom call at some point.
I have been busy for past week and couldn't get around to this. I am trying to understand the coredump, which will take a bit.
In the mean time, can you share how many db, schme, tables, and tablets are there in this cluster? I suspect apart from connections, you might be running into CPU contention due to high number for tablets. I am trying to check the temporal docs for schema.
I am assuming this is still a small (2vcpu, 8 gb ram) pod config
@yogendra sure, I can join slack or zoom or assist any possible way. Please just note that so far it seems I solved the crash by running the redis config mentioned above. Perhaps its too early to say but since yesterday the YB didnt crash (normally it was crashing within few hours).
@remeq I sent the core dump to the DB team, so I am waiting for their response. Sorry he do take some time as I was having trouble trying to open the core dump.
@shinji62 pls read previous posts. As of now I can report that the YB stopped crashing after I created the redis tables using the command I mention above. My conclusion is that the problem is with the YB helm distribution which after initial installation expects the redis tables to exist.
@remeq I did read your previous post but I think it is still important to understand what is going on.
We found that a flushdb
command is send to yedis and then backend is crashing because the flushdb
is run against a none existing table.
Now, I am not sure what trigger the flushdb
we are still looking into.
@shinji62 no problem. Let me know should you need any assistance from our side. Currently our YB is running without crashing for 14 days already.
Jira Link: DB-8028
Description
Hello,
we are deploying YB using helm chart into our Kubernetes:
After a few days of running, the cluster always starts failing. The yb-tserver-0,yb-tserver-1,yb-tserver-2 pods are crashing and restarting (~60 restarts of a pod wihin 2 hours). Currently we fix that by completely uninstalling YB and removing all PVCs (its a development environment).
I am attaching two log files from the stdout of the containers. I am not sure what we see in the logs is a root cause or already a conseuqence.
Perhaps following is the issue?
Note that this happens regularly. After a few days the YB always end up like this.
Besides the logs files which contain container stdout, I have also collected yb logs from the containers, but they are too big to upload here.
Regards, Premek tserver-0.log tserver-1.log
Warning: Please confirm that this issue does not contain any sensitive information