Generate SStables From CSV Or Json. The Data Loading Workhorse behind https://ip.thc.org
GNU General Public License v3.0
1
stars
0
forks
source link
sstable-migrator
Building
install and use java 8 , check with java -version
compile - mvn compile
run - mvn exec:java
to convert input/*
to sstables in /output
Setup Cassandra
Start Container - sudo docker run -v ./output/:/ferret/dnsdata -d --name cassandra --hostname cassandra --network cassandra cassandra
(Allow upto a minute for bootup)
Start a cqlsh shell - sudo docker exec -it cassandra cqlsh
Create Keyspace - CREATE KEYSPACE ferret WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'};
Create RDNS Table - CREATE TABLE ferret.dnsdata (apexDomain VARCHAR,recordType VARCHAR, subDomain VARCHAR, ip8 INET, ip16 INET, ip24 INET, ipAddress INET, country VARCHAR, city VARCHAR, asn VARCHAR, as_name VARCHAR, PRIMARY KEY (ip8,ip16,ip24,ipAddress,tld,apexDomain,subDomain) );
Create SubDomains table - CREATE TABLE ferret.subdomains ( domain VARCHAR, subdomain VARCHAR, PRIMARY KEY (domain,subdomain) );
Create CNAME table - CREATE TABLE ferret.cnames ( target VARCHAR, apexDomain VARCHAR, domain VARCHAR, PRIMARY KEY (target,apexDomain,domain) );
Move Data - sudo docker container exec -it cassandra sstableloader -d 172.18.0.2 /ferret/dnsdata/
Possible Improvements
use java FileChannel to read files (possible performance improvements) (no improvements observed)
use fastjson parser
use multithreaded writes to CQLSSTableWriter (https://issues.apache.org/jira/browse/CASSANDRA-7463 ) (bad idea, write performance is far better when keys are in order, writes with out of order keys take up a lot of cpu, but yield no improvement in conversion time)