Open mocchira opened 6 years ago
From my experience, catergorizing data into timestamped buckets do help much in data expiration!
@mocchira @windkit I've been considering this issue from this week. After organizing my thoughts, I'll share it. Actually, it is not easy as it relates to the multi-tenancy feature.
The spec of the current AVS aims to realize
while there was an assumption that DELETE operations against unstructured data rarely happen so it was not difficult choice for us to adopt append-only file format even though it come with the cost - compaction is needed for actually deleting data. however recent IoT expansion yields massive amount of time series data and the needs to delete expired data has gradually increased. it's possible to store data into date tiered buckets (ex. YYYYMMDD) and delete the whole bucket once it gets expired however the compaction cost would be non-negligible if its frequency is too often so I'd like to propose make AVS pluggable to be tailored to a specific use case other than the current one.
Just an idea however AVS per bucket seems great fit to the time series and expire needs because just deleting a OS file (AVS) become the only thing leofs have to do in case of dealing with a delete bucket.
Any comments, suggestions are appreciated.