instaclustr / cassandra-ttl-remover

Tool for rewriting SSTables to not contain TTLs
https://instaclustr.com
19 stars 8 forks source link
cassandra netapp-public remove remover sstable sstables time-to-live ttl

Cassandra TTL Remover

Tool for rewriting SSTables to remove TTLs

image:https://img.shields.io/maven-central/v/com.instaclustr/ttl-remover-impl.svg?label=Maven%20Central[link=https://search.maven.org/search?q=g:%22com.instaclustr%22%20AND%20a:%22ttl-remover-impl%22] image:https://circleci.com/gh/instaclustr/cassandra-ttl-remover.svg?style=svg["Instaclustr",link="https://circleci.com/gh/instaclustr/cassandra-ttl-remover"]

TTL remover removes TTL information for SSTables by rewriting them and creating new ones, so it looks like data has never expired and they will never expire either. This is handy for testing scenarios and debugging purposes when we want to have data in Cassandra, visible, but the underlying SSTable has expired them in the meanwhile. You can then load them back e.g. by sstableloader to Cassandra.

We are supporting:

Usage

The user of this software might either grab the binaries in Maven Central, or they may build it on their own.

The project consists of these modules:

Each remover for each respective Cassandra version removes TTLs from SSTables a little bit differently. It is notoriously known that Cassandra is little bit hairy when it comes to re-usability in other projects, so we are extending the modifying of some Cassandra classes by copying them over and rewriting stuff which can not be out of the box (e.g. the case for Cassandra 2.2.x is particularly strong here).

The impl module contains an interface SSTableTTLRemover which all Cassandra-specific modules implement. The SPI mechanism will load the concrete remover just because it was put on a class path. Hence, the mechanism to switch between Cassandra versions is to place the correct implementation JAR on the class path and the impl module will do the rest.

buddy-agent contains an agent which is used upon execution for Cassandra 3 and 4 TTL removal. The purpose of this module is to mock some DatabaseDescriptor static methods. Normally, this class is initialized when a Cassandra process is run, but we are not running anything. The removal logic uses these methods—normally we would have to have proper database schema in $CASSANDRA_HOME/data and logic which deals with reading and writing SSTables, and for example would also fetch data from system tables. This is not desirable and by introducing this module the only thing necessary is $CASSANDRA_HOME with libs/jars so we can populate the classpath.

The released artifacts do not ship Cassandra with it—you have to have $CASSANDRA_HOME set—pointing to your Cassandra installation from which you need to remove TTL information from SSTables.

run.sh script

It is recommended to use run.sh helper script if you want to remove TTLs. You are welcome to modify this script at its end to support your case. At the bottom, you see:


CLASSPATH=$CLASSPATH./impl/target/ttl-remover.jar:./cassandra-2/target/ttl-remover-cassandra-2.jar

java -cp "$CLASSPATH./impl/target/ttl-remover.jar:./cassandra-2/target/ttl-remover-cassandra-2.jar" \$JVM_OPTS \ com.instaclustr.cassandra.ttl.cli.TTLRemoverCLI \ --cassandra-version=2 \ --sstables \ /tmp/sstables2/test \ --output-path \ /tmp/stripped \ --cassandra-yaml \ $CASSANDRA_HOME/conf/cassandra.yaml \ --cassandra-storage-dir \ $CASSANDRA_HOME/data

On the other hand, for Cassandra 3/4, the command would look like this. Notice we are using byte-buddy agent, unlike for Cassandra 2 case, and we are also specifying CQL statement as we not have have access to data dir (it may be completely empty) but we need to construct metadata upon TTL removal, so we do it programmatically.


java -javaagent:./buddy-agent/target/byte-buddy-agent.jar \ -cp "$CLASSPATH./impl/target/ttl-remover.jar:./cassandra-3/target/ttl-remover-cassandra-3.jar" \ $JVM_OPTS \ com.instaclustr.cassandra.ttl.cli.TTLRemoverCLI \ --cassandra-version=3 \ --sstables \ /tmp/original-3/test/test \ --output-path \ /tmp/stripped \ --cql \ 'CREATE TABLE IF NOT EXISTS test.test (id uuid, name text, surname text, PRIMARY KEY (id)) WITH default_time_to_live = 10;'

All configuration options are as follows—you get this by the help command just after class specification:


Usage: ttl-remove [-hV] [-c=[INTEGER]] [-d=[DIRECTORY]] [-f=[FILE]] -p= [DIRECTORY] [-q=[CQL]] [-s=[DIRECTORY]] [-t=[FILE]] command for removing TTL from SSTables -p, --output-path=[DIRECTORY] Destination where SSTable will be generated. -f, --cassandra-yaml=[FILE] Path to cassandra.yaml file for loading generated SSTables to Cassandra, relevant only in case of Cassandra 2. -d, --cassandra-storage-dir=[DIRECTORY] Path to cassandra data dir, relevant only in case of Cassandra 2. -s, --sstables=[DIRECTORY] Path to a directory for which all SSTables will have TTL removed. -t, --sstable=[FILE] Path to .db file of a SSTable which will have TTL removed. -c, --cassandra-version=[INTEGER] Version of Cassandra to remove TTL for, might be 2, 3 or 4, defaults to 3 -q, --cql=[CQL] CQL statement which creates table we want to remove TTL from. This has to be set in case --cassandra-version is 3 or 4 -h, --help Show this help message and exit. -V, --version Print version information and exit.


--cassandra-storage-dir is the directory where all your data/SSTables are for your Cassandra installation. This has to be specified; normally it would point to something like /var/lib/cassandra/data to give an example. This has to be specified explicitly.

--cassandra-storage-dir and --cassandra-yaml are only necessary upon Cassandra 2 TTL removal.

--cassandra-yaml is the path to your cassandra.yaml; this has to be specified explicitly too.

Lastly, there has to be --output-path specified too—where your stripped SSTables from TTLs should be.

Load TTL-Removed SSTable to a New Cluster

  1. Create the keyspace and table of the target SStable in the new cluster.

  2. In the source cluster, use the following command to load the ttl-removed SSTable into the new cluster.

    ./sstableloader -d <ip address of new cluster node> [path to the ttl-removed sstable folder]

Build


$ mvn clean install

Tests are skipped by mvn clean install -DskipTests.

Please be sure that your $CASSANDRA_HOME is not set. Unit tests are starting an embedded Cassandra instance which is setting its own "Cassandra home", and having this set externally would confuse tests as it would react to a different Cassandra home.

Further Information

See Danyang Li's blog "TTLRemover: Tool for Removing Cassandra TTLs for Recovery and Testing Purposes"

Please see https://www.instaclustr.com/support/documentation/announcements/instaclustr-open-source-project-status/ for Instaclustr support status of this project