scylladb / scylladb

NoSQL data store using the seastar framework, compatible with Apache Cassandra
http://scylladb.com
GNU Affero General Public License v3.0
13.23k stars 1.26k forks source link

Writing stopped on SCyllaDB node with unconfigured table error #19316

Open amitesh88 opened 3 months ago

amitesh88 commented 3 months ago

We have a cluster as mentioned with below details , with data size 2TB on each DC1 node ,We suddenly faced issue with writing on the table

2024-06-14T22:38:08.897023585Z stdout F unconfigured table jcx_xx_240615

And below logs were found in ScyllaDB :

[shard 1:stre] storage_proxy - Failed to apply mutation from 101.177.96.xxx#1: raft::stopped_error (Raft instance is stopped, reason: "background error, std::_Nested_exception (State machine error at raft/server.cc:1212): std::bad_alloc (std::bad_alloc)")

This error was found only on one of the nodes and restarting the server (which took almost 2 hours) ,issue got resolved Writing was happening with local_quorum, Once server was up getting below logs on scyllaDB node:

scylla[3863]: [shard 6:stre] storage_proxy - Failed to apply mutation from 101.177.97.xxx#22: logalloc::bad_alloc (failed to refill emergency reserve of 30 (have 27 free segments)) scylla[3863]: [shard 0:main] storage_proxy - Exception when communicating with 101.177.96.xxx, to read from system_distributed.service_levels: logalloc::bad_alloc (failed to refill emergency reserve of 30 (have 27 free segments))

CPU and IO was observed normal on the node. ALso during the issue nodetool status on the particular node shows "UN" And also scylla-server was showing UP and running.

What could be the possible issue and how can we avoid this in future.

Installation details Scylla version (or git commit hash): 5.4.0 Cluster size: DC1 for writing 3 nodes for writing ,RF:3 ,DC2 6 nodes, RF:3 for reading OS (RHEL/CentOS/Ubuntu/AWS AMI): UBUNTU 20 on GCP E2 machine

Hardware details (for performance issues)
Platform (physical/VM/cloud instance type/docker): VM Hardware: sockets= cores= hyperthreading= memory= ,16 core CPU hyperthreaded and 32GB RAM Disks: (SSD/HDD, count) SSD disk

mykaul commented 3 months ago

Please upgrade to the latest 5.4 release. You may have hit an issue with bloom filters taking too much memory, which was recently fixed. It could be a different issue, but let's start with the upgrade.

amitesh88 commented 3 months ago

Thanks a lot for help. Can we go with 5.4.7? Is it the stable version?

mykaul commented 3 months ago

Thanks a lot for help. Can we go with 5.4.7? Is it the stable version?

Yes you can.