How to handle Out of Distribution for LSTM sequence classifier

microsoft / CNTK

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

https://docs.microsoft.com/cognitive-toolkit/

Other

17.49k stars 4.3k forks source link

How to handle Out of Distribution for LSTM sequence classifier #3797

Open MihaiHoriaPopescu opened 4 years ago

MihaiHoriaPopescu commented 4 years ago

Hello, I have a problem with my classification model. I am trying to create a classifier for some VR gesture recognition, were we cannot handle training with negative observations. I found that the problem is actually quite known in the real classification fields, since many times you will need to handle Out of Distribution Inputs. Did someone handle this problem?

delzac commented 4 years ago

I have already provided the 2 ways you can resolve this in my previous reply.

Either create an Other gesture in the classifier or build an anomaly detector to filter out unwanted gesture as a preprocessing step.

MihaiHoriaPopescu commented 4 years ago

@delzac Hello, creating an Other label is not suitable for my model. I have tried to create a balanced dataset with positive/negative gestures, using negative gesture as assembly of part of the other gestures, or also with other gestures registered, but the results was not sufficient to detect different gesture as residuals. Can you link an example of implementation of anomaly detector in cntk?I didn't find any solution, and implementing it by myself its quite hard since I have seen a lack of documentation for cntk api.

delzac commented 4 years ago

There's no need to use neural nets to build anomaly detectors. Scikit-learn has many algorithms that you can use, you can check it out here.

If you insist on using cntk as build an anomaly detector, then you need to build an autoencoder.

MihaiHoriaPopescu commented 4 years ago

@delzac Thank you for reply. Its not that I insist on implementing it in cntk, but the classical classification works on tabular data, where I am using sequences with different length. Instead of having a probability that sum to 1 for the classification, I was wondering that if its possible to use a result as a similarity with the gestures trained. I'm going to give a look at the autoencoders. I was also thinking about using time series with pattern recognition. This probably resolve the problem. Are there examples of lstm on time series for pattern recognition, instead of prediction?

delzac commented 4 years ago

In supervised learning there only two task that exist (1) classification (2) regression. Pattern recognition is classification.

You can always convert your gesture sequences of different length into a single fixed dimension vector. Much like in movie review sentiment classification where the review text is of unequal length but either could be positive or negative or neutral.