microsoft / service-fabric

Service Fabric is a distributed systems platform for packaging, deploying, and managing stateless and stateful distributed applications and containers at large scale.
https://docs.microsoft.com/en-us/azure/service-fabric/
MIT License
3.03k stars 399 forks source link

TTL & reliable collections / in-memory caching #713

Open olitomlinson opened 6 years ago

olitomlinson commented 6 years ago

Feature request

After the latest community Q&A session it was described that its likely that TTL capabilities will be implemented into reliable dictionary KeyValues to allow for an in memory 'caching' strategy.

It sounded like the intent was to create a sliding expiry window i.e. if a KeyValue is updated within the TTL period, it stays in the Dictionary and the TTL counter is reset. If the KeyValue becomes stale through lack of writes, it is evicted after the TTL period elapses.

Can I make a suggestion that you provide the option of fixing the eviction time relative to the time that the key was created (and do not update the eviction time based on subsequent updates/reads to the keyValue)

Whats the use case?

You might have heard of the GDPR act? This means we have to write software systems that evict data at an explicit time defined by our customers needs.

A TTL that is fixed is great because we can guarantee that objects (inside the KV) that contain customer data, can be deleted at the intended time.

yizhang82 commented 6 years ago

Thanks for the suggestion. We haven't finalized the TTL design yet but most likely we'll allow you to extend the TTL through the read/update/write, or explicitly using an extend operation. If you want to extend TTL through read/update/write you need to pass a flag (or the other way around, we haven't decided yet). Hopefully this should cover your need for a fixed time window. In the first release we may not implement both options.

A TTL that is fixed is great because we can guarantee that objects (inside the KV) that contain customer data, can be deleted at the intended time.

Please keep in mind that all TTL expiry are not guaranteed to be deleted within the window - it's best effort. This is the same in most (if not all) database offering TTLs as far as I can tell. It's true that in most cases you should not see that key after it expires before it is deleted, so it is as if it were deleted. However, due to transaction / isolation, if you read back a key before it expired, issuing another read after it expires can succeed in the same transaction, due to repeatable read isolation semantics. Depending on the interpreation of GDPR (I'm not a expert on GDPR regulations) and your situation, this may or may not be acceptable for GDPR purposes.

yizhang82 commented 6 years ago

/cc @rahku

olitomlinson commented 6 years ago

Great, good to hear you are supportive of the option.

With regards to interpretation of GDPR, my organisation has done substantial investigation from a legal and technical perspective to understand what is and what isn’t acceptable when trying to define & implement ‘data deletion’.

The outcome is

  1. An engineer must make absolute efforts to delete data from storage by using standard deletion mechanisms. For example in the instance of SQL, a DELETE command would be sufficient.

  2. Data that has been deleted can’t be readily recovered. (Data contained in backup is a gray area, so secure backup storage and recycling is paramount)

  3. Issuing delete commands that result in Logical deletion is ok as long as the underlying physical data is encrypted, and will be eventually permanently erased during compaction/house-keeping routines. I personally would argue that a user operation to delete a key from a reliable collection constitutes as a logical delete and the underlying stores eventually perform a physical delete on primaries and secondaries, constituting a permant delete.

  4. Timely deletion after expiry has elapsed should be sought after. Likelihood of being prosecuted for being a few seconds, minutes, even hours late is unlikely. However days and weeks late greatly increases the probability of a data breach occurring therefor greater risk of prosecution.

At this point in time most customers just want to know that your organisation can accept and execute ad-hoc data deletion requests (within a few weeks turn around time) but the safest interpretation of GDPR indicates being able to minimise data breach probability by obeying the customers TTL as closely as possible AND/OR deleting data after it is no longer operationally required by the software.

Hope this helps :)

nabeelrehman2114 commented 4 years ago

Any updates regrading TTL capabilities?

llx9 commented 4 years ago

Any updates regarding TTL capabilities?