vemurikarthik / cassandra

Mirror of Apache Cassandra
http://cassandra.apache.org
Apache License 2.0
0 stars 0 forks source link

Sweep: Data Corruption and OOM Issues During Schema Alterations #1

Open vemurikarthik opened 2 months ago

vemurikarthik commented 2 months ago

Overview: The primary issue is data corruption occurring during schema alterations (ADD/DROP column) on large tables(300+ columns and 6TB size ) in the production cluster. This is accompanied by out-of-memory (OOM) errors and other exceptions, specifically during batch reads. This problem has been replicated on multiple clusters, running Apache Cassandra version 4.0.12 and Datastax Java Driver Version: 4.17

Details:

Main Issue:

Data Corruption: When dynamically adding a column to a table, the data intended for the new column is shifted, causing misalignment in the data.
Symptoms: The object implementing com.datastax.oss.driver.api.core.cql.Row returns values shifted against the column names returned by row.getColumnDefinitions(). The driver returns a corrupted row, leading to incorrect data insertion.

Additional Issues:

Exceptions:

java.nio.BufferUnderflowException during batch reads when ALTER TABLE ADD/DROP column statements are issued.
java.lang.ArrayIndexOutOfBoundsException in some cases.

Buffer underflow exceptions with messages like "Invalid 32-bits integer value, expecting 4 bytes but got 292".

OOM errors mostly occur during ADD column operations, while other exceptions occur during DELETE column operations.

Method Specific: Errors occur specifically with row.getList(columnName, Float.class), returning incorrect values.

Reproducibility:

The issue is reproducible on larger tables (300 columns, 6 TB size) but not on smaller tables.
SELECT * statements are used during reads

Method Specific: Errors occur specifically with row.getList(columnName, Float.class), returning incorrect values. However, the code registers a driver exception when calling the method row.getList(columnName, Float.class). We pass the exact column name obtained from row.getColumnDefinition, but it returns the wrong value for a column with this name. This suggests that the issue lies with the driver returning an object with incorrect properties, rather than with the SQL query itself.

Debugging Efforts:

Metadata Refresh: Enabling metadata refresh did not resolve the issue.
Schema Agreement: session.getCqlSession().checkSchemaAgreement() did not detect inconsistencies during test execution.
sweep-ai[bot] commented 2 months ago
Sweeping

0%
💎 Sweep Pro: You have unlimited Sweep issues

Actions


[!TIP] To recreate the pull request, edit the issue title or description.