stuarteiffert / RNN-for-Human-Activity-Recognition-using-2D-Pose-Input

Activity Recognition from 2D pose using an LSTM RNN
288 stars 76 forks source link

Accuracy for the prediction in real time #16

Open shreyah opened 5 years ago

shreyah commented 5 years ago

Hello Guys,

I was wondering if it possible to find the accuracy for my prediction when running in real time?

I see from your model(shown below),the prediction returns the element with maximum value in your one_hot_prediction array. predictions = one_hot_predictions.argmax(1)

But, I want to find the accuracy of the prediction in range (0,1). I noticed the individual values of all elements in one_hot prediction array is in range (-6 to +6) (maybe because I have 6 class labels), and the sum of all the elements in one_hot_prediction is ranging between (-1,1).

I am not sure if my deduction is right, it would be great if you give me more details on finding the accuracy of prediction.

stuarteiffert commented 5 years ago

Hi Shreyah,

It's not possible to find the 'accuracy' during inference (without having a ground truth), however it is possible to know how confident the network is in its classification.

You're looking at the right place, one_hot_predictions holds the output of the last layer of the network (tf.matmul(lstm_last_output, _weights['out']) + _biases['out']) and so will give you the best idea of confidence per class.

shreyah commented 5 years ago

Dear Stuart,

Thanks a lot for sharing this information. It would be great if you can also shed some light in the following info:

I am able to do Action Recognition in real time but I see that in order to predict, the prediction array must fit the dimensions i.e. I should send data with 32 frames. Is there are way I can make it to predict the actions by giving only the first 10 frames? I know that LSTM can be used for making time series forecast, but is it possible to do prediction by feeding only few frames?

Regards, Shreyah

On Thu, Feb 14, 2019, 10:28 PM stuarteiffert <notifications@github.com wrote:

Hi Shreyah,

It's not possible to find the 'accuracy' during inference (without having a ground truth), however it is possible to know how confident the network is in its classification.

You're looking at the right place, one_hot_predictions holds the output of the last layer of the network (tf.matmul(lstm_last_output, _weights['out'])

  • _biases['out']) and so will give you the best idea of confidence per class.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/stuarteiffert/RNN-for-Human-Activity-Recognition-using-2D-Pose-Input/issues/16#issuecomment-463804743, or mute the thread https://github.com/notifications/unsubscribe-auth/AV1FLYhrLaHOUsro04u6GC-uuhwO841Qks5vNdUKgaJpZM4a1_ei .

KristianDukov commented 5 years ago

@shreyah Can you share how are you applying the model on your own data?

Best Regards, Kris

stuarteiffert commented 5 years ago

Hi Shreyah,

Sorry about delayed response. I think tensorflow actually needs a set number of steps, it doesn't seem to be as flexible as other libraries like pytorch, which allows this.

What you can do is simply pad the input sequence with zeros to the required length and use tf.nn.dynamic_rnn() which takes sequence length as an input. This blog explains it a bit better: https://danijar.com/variable-sequence-lengths-in-tensorflow/

Hopefully you already worked it out yourself!