Tool for rewriting SSTables to remove TTLs
image:https://img.shields.io/maven-central/v/com.instaclustr/ttl-remover-impl.svg?label=Maven%20Central[link=https://search.maven.org/search?q=g:%22com.instaclustr%22%20AND%20a:%22ttl-remover-impl%22] image:https://circleci.com/gh/instaclustr/cassandra-ttl-remover.svg?style=svg["Instaclustr",link="https://circleci.com/gh/instaclustr/cassandra-ttl-remover"]
TTL remover removes TTL information for SSTables by rewriting them and creating new ones, so it looks like data has never expired and they will never expire either.
This is handy for testing scenarios and debugging purposes when we want to have data in Cassandra, visible, but the underlying SSTable has expired them in the meanwhile.
You can then load them back e.g. by sstableloader
to Cassandra.
We are supporting:
The user of this software might either grab the binaries in Maven Central, or they may build it on their own.
The project consists of these modules:
Each remover for each respective Cassandra version removes TTLs from SSTables a little bit differently. It is notoriously known that Cassandra is little bit hairy when it comes to re-usability in other projects, so we are extending the modifying of some Cassandra classes by copying them over and rewriting stuff which can not be out of the box (e.g. the case for Cassandra 2.2.x is particularly strong here).
The impl
module contains an interface SSTableTTLRemover
which all Cassandra-specific modules
implement. The SPI mechanism will load the concrete remover just because it was put on a class path.
Hence, the mechanism to switch between Cassandra versions is to place the correct implementation
JAR on the class path and the impl
module will do the rest.
buddy-agent
contains an agent which is used upon execution for Cassandra 3 and 4 TTL removal. The purpose of this
module is to mock some DatabaseDescriptor
static methods. Normally, this class is initialized when a Cassandra process is run,
but we are not running anything. The removal logic uses these methods—normally we would have to have
proper database schema in $CASSANDRA_HOME/data
and logic which deals with reading and writing SSTables, and for example
would also fetch data from system tables. This is not desirable and by introducing this module
the only thing necessary is $CASSANDRA_HOME
with libs/jars so we can populate the classpath.
The released artifacts do not ship Cassandra with it—you have to have $CASSANDRA_HOME
set—pointing
to your Cassandra installation from which you need to remove TTL information from SSTables.
It is recommended to use run.sh
helper script if you want to remove TTLs. You are welcome to
modify this script at its end to support your case. At the bottom, you see:
CLASSPATH=$CLASSPATH./impl/target/ttl-remover.jar:./cassandra-2/target/ttl-remover-cassandra-2.jar
On the other hand, for Cassandra 3/4, the command would look like this. Notice we are using
byte-buddy agent, unlike for Cassandra 2 case, and we are also specifying CQL statement as we
not have have access to data
dir (it may be completely empty) but we need to construct metadata
upon TTL removal, so we do it programmatically.
All configuration options are as follows—you get this by the help
command just after class specification:
Usage: ttl-remove [-hV] [-c=[INTEGER]] [-d=[DIRECTORY]] [-f=[FILE]] -p= [DIRECTORY] [-q=[CQL]] [-s=[DIRECTORY]] [-t=[FILE]] command for removing TTL from SSTables -p, --output-path=[DIRECTORY] Destination where SSTable will be generated. -f, --cassandra-yaml=[FILE] Path to cassandra.yaml file for loading generated SSTables to Cassandra, relevant only in case of Cassandra 2. -d, --cassandra-storage-dir=[DIRECTORY] Path to cassandra data dir, relevant only in case of Cassandra 2. -s, --sstables=[DIRECTORY] Path to a directory for which all SSTables will have TTL removed. -t, --sstable=[FILE] Path to .db file of a SSTable which will have TTL removed. -c, --cassandra-version=[INTEGER] Version of Cassandra to remove TTL for, might be 2, 3 or 4, defaults to 3 -q, --cql=[CQL] CQL statement which creates table we want to remove TTL from. This has to be set in case --cassandra-version is 3 or 4 -h, --help Show this help message and exit. -V, --version Print version information and exit.
--cassandra-storage-dir
is the directory where all your data/SSTables are
for your Cassandra installation. This has to be specified; normally it would point to something like
/var/lib/cassandra/data
to give an example. This has to be specified explicitly.
--cassandra-storage-dir
and --cassandra-yaml
are only necessary upon Cassandra 2 TTL removal.
--cassandra-yaml
is the path to your cassandra.yaml
; this has to be specified explicitly too.
Lastly, there has to be --output-path
specified too—where your stripped SSTables from TTLs should be.
Create the keyspace and table of the target SStable in the new cluster.
In the source cluster, use the following command to load the ttl-removed SSTable into the new cluster.
./sstableloader -d <ip address of new cluster node> [path to the ttl-removed sstable folder]
Tests are skipped by mvn clean install -DskipTests
.
Please be sure that your $CASSANDRA_HOME is not set. Unit tests are starting an embedded Cassandra instance which is setting its own "Cassandra home", and having this set externally would confuse tests as it would react to a different Cassandra home.
See Danyang Li's blog "TTLRemover: Tool for Removing Cassandra TTLs for Recovery and Testing Purposes"
Please see https://www.instaclustr.com/support/documentation/announcements/instaclustr-open-source-project-status/ for Instaclustr support status of this project