ludwig-ai / ludwig

Low-code framework for building custom LLMs, neural networks, and other AI models
http://ludwig.ai
Apache License 2.0
11.09k stars 1.19k forks source link

Is it possible to do multivariate timeseries binary classification using ludwig? #668

Open Jim2002 opened 4 years ago

Jim2002 commented 4 years ago

Describe the use case Is it possible to do multivariate timeseries binary classification using ludwig? Pass something like the below csv to ludwig for training.
time_series_df_example (please disregard the the values in timeseries_data 1 ~ 4 are the same, they are meant to be different) Describe alternatives you've considered If not possible yet, the alternative I can think of is to combine all the timeseries into one long sequence, but that won't make sense.

w4nderlust commented 4 years ago

@Jim2002 yes it is possible. You have a lot of flexibility: you can have one set of weights to encode timeseiries that is different for each time series and then combine them, or you can combine them first and encode all together with one set of weights. You can also first encode with timeseries specific weights and then combine with an additional set of weights.

In the first case, you need to create one input feature for each timeseries, you can use different encoders for ech feature if you prefer, or you can use the same encoder with the same weights using the tied_weights. mechanism. In the second case you need one input feature for each timeseries setting the encoder: passthrough and using a a sequence combiner combiner: {type: sequence, encoder: yourchoice}. For the third case, you can specify both the encoders for each timeseries feature and also the sequence combiner.

For more details i suggest you to red the sequence, timeseries, and sequence combiner section in the user guide.

Let me know if you have further questions, otherwise I can close this.

ifokeev commented 4 years ago

@Jim2002 also you can try https://github.com/TDAmeritrade/stumpy to build matrix profile for each of your timeseries (= use it as the encoder) and passthrough it to RNN

Jim2002 commented 4 years ago

@w4nderlust @ifokeev Thank you very much for the links and the info, I will have a read, will post questions here if I have some. Much appreciated.

w4nderlust commented 4 years ago

@ifokeev that sounds like a nice way to do preprocessing for timeseries. We do something similar for audio features, they can be used both raw or processed, we could do the same for timeseries data, adding an optional preprocessing in Ludwig that calls this library. Would that be useful for you? Do you think you would be able to contribute something like this if I give you precise instructions? Let me know!

ifokeev commented 4 years ago

@w4nderlust yeah, I want to contribute this. My pleasure is to help you. Should I want until TF2? Really, I have already a big codebase regarding ludwig, which I want to open source.

w4nderlust commented 4 years ago

@ifokeev that’s great! This specific thing would have to do with data preprocessing, which right now is not touched by the tf2 porting so if we are willing we can start on this immediately. Other contributions may have to wait until tf2 porting is done.

ifokeev commented 4 years ago

@w4nderlust ok then. Please, provide the instruction as you see the implementation. I'm familiar already with ludwig architecture, but anyway need your opinion. I'll try to PR this on the weekend

w4nderlust commented 4 years ago

Sounds good. So the first instruction I can give you is to look at audio_feature.py specifically those lines: https://github.com/uber/ludwig/blob/master/ludwig/features/audio_feature.py#L134-L140 . The way it works is that if raw is specified, the input audio signal is left untouched,, and it's basically a univariate timeseries. If another type of preprocessing is specified, then soundfile is called and the output is a matrix instead. I think the same can be done with the library you suggested. Let me know if that is not the case after looking at the implementation of the audio feature.

Jim2002 commented 4 years ago

@w4nderlust I finished reading through the sections from the user guide, I'm not sure if I understand it corrently but I tried building the first case model you mentioned, one input feature for each timeseries, with the same encoder with the same weights using the tied_weights. mechanism. Please see below. After running train, this is what I got printed on the console, does this mean that it's only using one input feature, one of the timeseries (timeseries_4)? Thank you.

# what printed on the console
 'input_features': [   {   'cell_type': 'lstm',
                           'encoder': 'rnn',
                           'name': 'timeseries_4',
                           'tied_weights': 'timeseries_1',
                           'type': 'timeseries'}]
# model_definition.yaml
input_features:

    -
        name: timeseries_1
        type: timeseries
        encoder: rnn
        cell_type: lstm

        name: timeseries_2
        type: timeseries
        encoder: rnn
        cell_type: lstm
        tied_weights: timeseries_1

        name: timeseries_3
        type: timeseries
        encoder: rnn
        cell_type: lstm
        tied_weights: timeseries_1

        name: timeseries_4
        type: timeseries
        encoder: rnn
        cell_type: lstm    
        tied_weights: timeseries_1

output_features:
  -
        name: target
        type: binary
w4nderlust commented 4 years ago

@Jim2002 this is correct, the same weights are used to encode with an lstm all 4 features. Then they are reduced and concatenated (by default concat combiner is used, you can search for it in the user guide), and finally the binary classification is performed. This should already work as a starting point, then you can decide to use the additional mechanisms for improve performance.

Jim2002 commented 4 years ago

@Jim2002 this is correct, the same weights are used to encode with an lstm all 4 features. Then they are reduced and concatenated (by default concat combiner is used, you can search for it in the user guide), and finally the binary classification is performed. This should already work as a starting point, then you can decide to use the additional mechanisms for improve performance.

@w4nderlust Do you mean change different encoders, try second and third case to improve performance? The loss of the current model stays the same after 30 epochs.

ifokeev commented 4 years ago

@Jim2002 try different learning rate / features / window size / etc. Ludwig can't do magic without data science skills.

Jim2002 commented 4 years ago

@ifokeev got it, thank you.

w4nderlust commented 4 years ago

@Jim2002 I was referring specifically to the mechanism 2 and 3 in my previous message, so to encoding multivariate features after concatenation and to encoding both before and after, using the sequence combiner, but also playing with any other hyperparameter as @ifokeev suggested is important for squeezing performance. In particular he is right, usually a smaller learning rate is a good idea if training time is not an issue, the model will do progress slower but will likely converge to a narrower minimum. Also Ludwig provides mechanisms for all sort of things, including decaying learning rate, increasing batch size and many others, take a look at the training section of the user guide.

Jim2002 commented 4 years ago

@w4nderlust ok thanks a lot.

ifokeev commented 4 years ago

@w4nderlust FYI: found a bug in my implementation. Need time to fix and test.

ifokeev commented 4 years ago

PR with the integration for timeseries https://github.com/uber/ludwig/pull/688

w4nderlust commented 4 years ago

@ifokeev thank you so much, will take a look at it shortly.