kairosdb / kairosdb

Fast scalable time series database
Apache License 2.0
1.74k stars 344 forks source link

Feature request: Handle multiple datapoints tables to utilize TWCS with multiple TTLs #435

Open rdzimmer-zz opened 6 years ago

rdzimmer-zz commented 6 years ago

Time Window Compaction Strategy (TWCS) offers large advantages over Size Tier Compaction Strategy when working with time series data. The data is bucketed into windows as it comes in. Once the window's SSTable is written, it should not have to be compacted again. This reduces compaction work as well as disk space requirements (since the complete 50% overhead potential of STCS is gone). It also means that queries for portions of the time series data will often be satisfied in fewer SSTables. For example, with weekly buckets a query for data from last week would all be in one SSTable.

https://docs.datastax.com/en/cassandra/latest/cassandra/operations/opsConfigureCompaction.html https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlCreateTable.html#compactSubprop__compactionSubpropertiesTWCS https://docs.datastax.com/en/cassandra/latest/cassandra/dml/dmlHowDataMaintain.html http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html

However, TWCS works best when the TTL of the data is the same. When the TTLs are the same, the entire last SSTable is able to simply be dropped once it expires. When different TTLs are mixed in, extra work has to be done to rewrite that data to new tables before the oldest can be dropped. Or depending on the setup, I believe it is also possible that data will not be deleted from disk as soon as expected (although from a queries perspective it is gone).

TWCS settings are on a per table basis. Since KairosDB has a single datapoints table, it is not possible to have different TTLs and take full advantage of the TWCS advantages. For example, roll-ups with TTL of 1 year go into the same table as raw datapoints with a TTL of 1 week. By allowing for the choice of which table datapoints go to, both raw and aggregated data could be sent to the datapoints table that matches the desired TTL TWCS settings.
To clarify, I believe that the best option would be to be able to choose the datapoints table both for normal data and roll-ups. It's possible someone would have different TTL raw data, as well as different TTL roll-up data. Basically, selecting a specific datapoints table may be desired for more than just part of the roll-ups features.

brianhks commented 6 years ago

Can you also post the results from your tests with different TTL's on the same set of sstables?

rdzimmer-zz commented 5 years ago

Sorry, must not have gotten the email notification of your question. I may have more results from back then somewhere if I can find them, but here is what I posted in the forum:

https://groups.google.com/forum/#!topic/kairosdb-group/3ueQZE67CFU

I did some quick experimenting with TWCS.  I set the TWCS compaction_window_size to 20 minutes and started uploading 500K metrics per minute with a TTL of 1 hour.  I also decreased my gc_grace_seconds to 10 minutes and set unchecked_tombstone_compaction to true.  
After an hour I had 3 SSTables of 16MB spaced 20 minutes apart, plus newer SSTables that fell in the current 20 minute window (the current window uses STCS).  By the end of each 20 minute window the oldest bucket was dropped, and then a new one created.  
I then added in an additional 250K metrics per minute with a TTL of 2 hours.  My 20 minute buckets increased in size to 24MB.  However, after two hours I still only have 3 of the 20 minute bucket SSTables.  The oldest SSTable is ~1 hour.  When it drops the oldest, I believe it is pulling out the 2 hour TTL metrics and putting them into a new SSTable.  The amount of new SSTables increased again after an hour when it would have had to start doing this.  
So it is definitely doing more work this way, reprocessing the 2 hour TTL data.  This was a pretty simple example where the 1 hour data was larger than the 2 hour data.  I'm not positive what will happen as the ratios of data TTLs change.  It could start keeping the tables longer and not getting rid of the expired data.  The scenario I'm thinking of is 1 minute raw data with TTL of 1 week, then hourly aggregated datapoints kept for 52 weeks.  After a year the raw and aggregated data should be close to the same size, but what size buckets would be best?  I'll modify my test scenario to be closer to this (just divide the times by ~100 to speed up the test).  
Either way, different tables seems like it would be a big improvement if you want TWCS and aggregation.  Nice thinking ahead when doing the CQL, thanks! :)