scylladb / scylla-code-samples

Code samples for working with ScyllaDB
Apache License 2.0
237 stars 130 forks source link

it's not enough to just replace name of DateTiered to TimeWindow #215

Closed tarzanek closed 1 month ago

tarzanek commented 6 months ago

https://github.com/scylladb/scylla-code-samples/commit/cd1194fa2d612e5306e639af599650b6b88563d7 introduced a regression

since TWCS DOESN'T have options like

'base_time_seconds': 3600, 'max_sstable_age_days': 1

so conversion per docs:

base_time_seconds: This is the size of the first window, defaults to 3600 seconds (1 hour). The rest of the windows will be min_threshold (default 4) times the size of the previous window.

max_sstable_age_days: This is the cut-off when SSTables wont be compacted anymore, if they only contain data that is older than this value, they will not be included in compactions. This value should be set to some point where you won't (frequently) read any data. In a monitoring system for example, you might only very rarely read data that is older than one year. This avoids write amplification by not recompacting data that you never read. Defaults to 365 days.

you need to rewrite it to proper https://enterprise.docs.scylladb.com/stable/cql/compaction.html#twcs-options options

@guy9

so above translated will be cca (you cannot simulate fully)

'class' : 'TimeWindowCompactionStrategy',
  'compaction_window_unit' : 'HOURS',
  'compaction_window_size' : 1,
}

we should ideally set the default_ttl on table to make sure above won't grow over 20-50 windows that means setting on top of above for table (but please check if it fits the MMS use case, if not warn about this being not ideal and ICS + SAG might be better) - 3600 x 20 or 3600 x 50 MAX (https://thelastpickle.com/blog/2016/12/08/TWCS-part1.html , but my experience with (older) Scylla is 13-20 windows, with newer Scylla it probably can go a bit higher)

default_time_to_live = 40 372

so that would be 20h retention for such time series data ...

tarzanek commented 6 months ago

or if you need longer, then set bigger window size ...

guy9 commented 6 months ago

Thanks @tarzanek. Wouldn't 20 windows be: default_time_to_live = 72000 (3600*20)? How did you reach the number of 40372?

tarzanek commented 6 months ago

yep, sorry, wrong copy paste or typo use the proper value

DanielHe4rt commented 1 month ago

@tarzanek @guy9 this issue is already finished? If yes, let's close the discussion.

tarzanek commented 1 month ago

yep, merged, closing