ludwig-ai / ludwig

Low-code framework for building custom LLMs, neural networks, and other AI models
http://ludwig.ai
Apache License 2.0
11.11k stars 1.19k forks source link

Time series beat/non-beat classification #1052

Open SutirthaChakraborty opened 3 years ago

SutirthaChakraborty commented 3 years ago

Hi, I am a newbie. I have a dataset with 5 columns.

  1. timestamp
  2. signal-1
  3. signal-2
  4. signal-3
  5. 0/1 [marked 0 if it's not a beat and 1 if it a beat ]

I wanted to train a model, which would try to predict beats and non-beats. It is a time-series problem. There are too many numbers of zeros. So, models can't learn the number of ones.

The model definition is - model_definition = { 'input_features':[ {'name':'time_motion', 'type':'timeseries'}, {'name':'X_motion', 'type':'timeseries'}, {'name':'Y_motion', 'type':'timeseries'}, {'name':'QoM_motion', 'type':'timeseries'}, ], 'output_features': [ {'name': 'beat_sound', 'type': 'category'} ] }

Samplle.csv

Can someone help me with this ? Colab link - Click here

w4nderlust commented 3 years ago

@SutirthaChakraborty can you give us a bit more insight into the data? If i understand correctly the input is one long time series, with each row being a sample in a specific timestamp with a binary label (beat, non beat) associated with it. If this is correct, there are several ways to approach the problem with Ludwig. The way you did it looks as each value of each row to determine if is a beat, even if you are calling those columns timeseries, they are actually just one single number, they may as well be numerical features. But what i believe you may want to do is preprocess the data so that each cell is a list of numerical values (for instance the last 10 values before the present time) then the model will look also at the previous values to figure out is the current timestamp is a beat or not. Hopefully this makes sense to you, I suggest you check out the timeseries example which does a similar preprocessing: https://ludwig-ai.github.io/ludwig-docs/examples/#time-series-forecasting-weather-data-example

SutirthaChakraborty commented 3 years ago

Thanks, @w4nderlust for the detailed answer.

Dataset- When I plot the columns, the red lines are the beat. Others are the input signals. image

Preprocessing - For add_sequence_feature_column , I have multiple columns. Or I need to concatenate features side by side with space and make into single column? add_sequence_feature_column(df, [df.X_motion,df.Y_motion,df.QoM_motion,df.time_motion], 20)

Config file- model_definition = { 'input_features':[ {'name':'input', 'type':'numerical'} ], 'output_features': [ {'name': 'output', 'type': 'category'} ] } Is this okay then? My output are only 0 for nonbeats and 1 in case of the beat.

Code - Colab