zhihanyue / ts2vec

A universal time series representation learning framework
MIT License
619 stars 148 forks source link

data shape, loading custom data, possible lookahead #14

Closed gminorcoles closed 2 years ago

gminorcoles commented 2 years ago

Hi, I am trying to test on my own datasets which are multivariate time series. I load the data into a Dataframe and then create the slices for train validate and test, just mimicking the existing code.

There is a point where my n x m data, where m is the number of features, or covariate time series, is expanded to 1 x n x m. The comments in your code say "number of instances x timestamps x features". What is instances in this context?

I am worried that my results are perhaps too good to be true and I am trying to make sure I understand where lookahead might be.

zhihanyue commented 2 years ago

"instance" means "sample". The difference between instances and features is that instances generally come from the same distribution. For example, for stock market, n_instance is the number of stocks, and n_features is the number of features (e.g. open, high, low, close, volume). For traffic flow, n_instance can be the number of roads, and n_features is the number of features collected on the road.

For a multivariate time series whose data shape is "n_variables x n_timestamps", the input shape may be either "1 x n_timestamps x n_variables" or "n_variables x n_timestamps x 1". It depends on whether these variables are features or samples.

gminorcoles commented 2 years ago

Thank you for the clarification. I had not read the recent papers using contrastive loss, I am catching up. This is good work, thank you.