ludwig-ai / ludwig

Low-code framework for building custom LLMs, neural networks, and other AI models
http://ludwig.ai
Apache License 2.0
11.08k stars 1.19k forks source link

Preprocessing for univariate/multivariate timeseries ? #244

Open yanisIk opened 5 years ago

yanisIk commented 5 years ago

Is your feature request related to a problem? Please describe. Complexity of preprocessing/transforming data for LSTM networks.

Describe the use case Most timeseries data come in the format where each row represents a timestamp or step. The generic preprocessing step for timeseries/sequences is to transform N rows to 1 row, using a sliding window for example.

Describe the solution you'd like A preprocessing feature that transforms N rows into one row, that supports sliding window or tumbling window and that supports univariate or multivariate timeseries/sequences.

Input:

Params: timeColumn="time", N=3, WindowType="sliding" and label=["label"]

Output:

Describe alternatives you've considered Doing it by hand everytime.

I'm not very proficient in Python and manipulating matrices and multiple for loops can become confusing and very error prone. I'm looking for a simple script that can do that, even if it's not directly integrated as a preprocessor such as tokenizers.

Thanks

w4nderlust commented 5 years ago

This is a good idea. Ludwig's way of dealing with timeseries is reminiscent of text features and may be confusing to someone who is used to the timestamp based way to represent them. I will consider adding this in future releases.

SirJohnFranklin commented 5 years ago

It seem that doing it by hand is nasty work and it maybe is. Still, you can use add_sequence_feature_column() and you'll notice that it may take a lot of time (days in my case), depending on the size of your dataset. Thus, if the proposed method is not significantly faster it may not be an option do it at the creation of the model.

See add_sequence_feature_column(): https://uber.github.io/ludwig/examples/#time-series-forecasting-weather-data-example