nla-group / fABBA

A Python library for the fast symbolic approximation of time series
BSD 3-Clause "New" or "Revised" License
41 stars 8 forks source link

Discontinuous timeseries #2

Closed nsankar closed 2 years ago

nsankar commented 2 years ago

@chenxinye Thanks for developing this great library. My questions are 1) Can this be applied to discontinuous time series where the sampling time / measurements is irregular?

2) Are there specific circumstances based on which it is better to apply the clustering based compression?

3) What is the key advantage of this method over SAX?

Thanks in advance.

chenxinye commented 2 years ago

Hi @nsankar

Thanks for your message! To the best of my knowledge, my answers are:

1) Sure, it can be applied to discontinuous time series. But I am not sure what kind of applications you are interested in.

2) To the best of my knowledge, k-means is the best among the clustering, but using aggregation can achieve similar performance while enjoying significant speedup. I think to a large extent, you should choose aggregation first, and the second choice is k-means. Both of them can obtain an upper bound of the within-cluster sum of squares (WCSS) to ensure a quality symbolic reconstruction of time series. Density clusterings in this application are hard to obtain an accurate reconstruction of time series via a limited number of symbols (density clustering tends to group objects in a high-density region together, which might result in low WCSS).

3) ABBA symbols represent the local up-and-down behavior of time series, which enjoys higher accuracy in reconstruction against SAX using the same number of symbols (For detail we refer to performance profile experiment in ABBA and fABBA papers).

Hope these are helpful. Please be free to let me know any further questions. After Stefan is back, he can offer more professional answers and feedback.

nsankar commented 2 years ago

@chenxinye Thanks for your response. By discontinuous time series, I was considering for instance the human pose coordinate measurements which are discrete over time.

With respect to the clustering, my question was more from the characteristics of the time series. For instance, if i recollect in one of the SAX test data sets, the time series represented different human movement states like walking, running, shooting etc. So I was wondering, if for such data with different states captured in a timeseries, clustering method might group these and finally the outcomes might be better than the aggregation. This is something I am looking at trying out.