robinhood / faust

Python Stream Processing
Other
6.72k stars 535 forks source link

Feature request: expose non-windowed table with expiration #230

Open mstump opened 5 years ago

mstump commented 5 years ago

I'm using windowed tables as a local cache for an agent. In order to get expiration I have to use a windowed table with an expiration of a day, but this results in multiple versions of a key being stored over the course of the day. In my instance I'm caching last message from an IoT device so that I can compute deltas. Storing last key results in 10s of GB stored per node. Storing N versions of a key for the entire day will result in possible terabytes of storage per node. Using current abstractions I'd need to implement my own background GC/expiration process for expired keys to reclaim space from non-reporting users.

The current expiration implementation doesn't really make sense for this use case because it's tied to windows. RocksDB does have a native TTL enforcement mechanism that could be used in its stead, I'm not sure how difficult it would be to integrate given current design assumptions in the existing table code.

ask commented 5 years ago

Seems you're right, RocksDB does support TTL: https://github.com/facebook/rocksdb/wiki/Time-to-Live We could use this to expire keys in rocksdb, but doesn't seem like this is implemented in rocksdb-python. Does not look like adding support is that hard, maybe just a matter of adding this DBWithTTL class.

hamroune commented 4 years ago

hi everyone, is there any plan to add this feature?