guillaume-chevalier / LSTM-Human-Activity-Recognition

Human Activity Recognition example using TensorFlow on smartphone sensors dataset and an LSTM RNN. Classifying the type of movement amongst six activity categories - Guillaume Chevalier
MIT License
3.33k stars 935 forks source link

Shape of input signal (7352, 128, 9) #6

Closed madhavpr closed 7 years ago

madhavpr commented 7 years ago

Hi Guillaume,

First of all, thanks a lot for this wonderful walkthrough of Human Activity Recognition using LSTMs. Your code worked like a charm on my CPU. I'm new to deep learning and don't have a lot of practical experience with RNN/LSTM. Your code has provided an excellent guide to see what's happening under the hood.

This isn't really an issue, but I was wondering how did the number 128 come about. I know it's the number of timesteps per series. Does this number 128 refer to the number of columns of the 9 input text files? And more abstractly, I am not sure about how to choose the number of time steps. Is there any guideline to do this?

Thanks a lot,

guillaume-chevalier commented 7 years ago

Hi to you,

Yes, there are 128 columns in each signal text files, which represents timesteps of a sample for a duration of 2.56 seconds. The number of 128 was specially chosen by the authors of the dataset, yet I don't know the exact reason for this. Other studies on HAR use different window sizes. Given the current dataset, I could not increase the window size (and that I did not try to shorten it), however, the window size is said to be an important parameter to tune for any RNN.

To me, it seems that the window size depends on the data relatively to the problem you want to solve, and of which algorithm goes on top of that. An empirical approach of trying different window sizes is what seems to me a good approach; if you have a pure dataset in your hands, I would suggest to you to try to optimize for different window sizes. There is also a balance to achieve about the window size, related to the Heisenberg's uncertainty principle. In my case, only one prediction is made for every 2.56 seconds window, as I take only the last output of the LSTM cell at the end of having seen complete series, the RNN slowly building its embedding.

You might also find interesting information here: https://www.google.ca/webhp?#q=activity+recognition+window+size