yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
9k stars 1.07k forks source link

[YCQL] After some time intensive workload start throwing rejections on write #23555

Open pilshchikov opened 2 months ago

pilshchikov commented 2 months ago

Jira Link: DB-12473

Description

Case:

  1. 3 nodes RF=3 cluster m7g.large instances
  2. Start cycle: 2.1. Start CassandraKeyValue workload, wait till load will be started 2.2. Start CassandraRangeKeyValue workload, wait till load will be started 2.3. Start CassandraBatchKeyValue workload, wait till load will be started 2.4. Start CassandraEventData workload, wait till load will be started 2.5. Start CassandraTransactionalKeyValue workload, wait till load will be started 2.6. Start CassandraTransactionalRestartRead workload, wait till load will be started 2.7. Start CassandraTimeseries workload, wait till load will be started 2.8. Start CassandraUserId workload, wait till load will be started 2.9. Start CassandraPersonalization workload, wait till load will be started 2.10. Start CassandraSecondaryIndex workload, wait till load will be started 2.11. Wait 4 minutes 2.12. Stop all workloads

After a 5 cycles CassandraSecondaryIndex start failing and throw errors that some information is missed Each workload is removing old table and creating a new one Running this unvierse in this time moment appear that data is not inserted properly and some rejections happen exactly at this cycle Memory usage is Ok and I can not found this issue in this area. CPU is loaded to 100% as usally happen Start to happen after 2.23.1.0-b18 , before this test passed constantly Only one strange exeption is

2024-08-17 03:57:42,349 [Thread-13] ERROR AppBase - Caught Exception: java.lang.IndexOutOfBoundsException: Index: 0Exception encountered at hosts: 

All logs and report link in JIRA first comment

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

pilshchikov commented 2 months ago

@m-iancu Main problem happen with CassandraSecondaryIndex Writes just didn't going throught, and didn't fail. That mean the write is happen, but under the load index is not going through It reproduced frequently. In this case CassandraSecondaryIndex is running with batch_size=100 property. Most of the time issue happen on a first cycle If batching is disabled then issue happen on further cycles.