microsoft / Trill

Trill is a single-node query processor for temporal or streaming data.
MIT License
1.24k stars 133 forks source link

Ingress data policy: forward-looking outliers #14

Open cybertyche opened 5 years ago

cybertyche commented 5 years ago

There are already data policies at ingress for data that arrives "late". We can drop, adjust, or throw when data arrives late, and we can hold data in reserve for a certain period of time to allow some reordering.

However, if a data point arrives "too early" we do not have a way to deal with it currently. For instance, if the current data time is X, and the next data point arrives with a timestamp of X + 2 days, this may be a result of:

Today, however, we accept this value as valid and current, and the sync time is advance all the way to X + 2 days. Any further data will now be compared against the new sync time, and thus data may come start to get dropped or adjusted improperly.

We would like to add an ingress policy that allows for a threshold to be specified for maximum sync time advancement. If a data value arrives so far into the future that the maximum advancement is exceeded, the value is either:

Chaycej commented 3 years ago

If this issue is still open, I can take this up