Video lstm auto encoder built with pytorch. https://arxiv.org/pdf/1502.04681.pdf
This project auto-encodes videos by vectorizing consecutive screens of videos using an LSTM auto-encoder.
The training data is a collection of cow screen images sampled from some videos. Per image is sampled for every 50 frames and 6 consecutive images are used as a training sample. Since the video is 30 frames / second, one training sample is a summary of movements in 10 seconds. We tried to analyze the movements of cows by analyzing the output movement vectors of the model.
The images are vectorized using some CNNs like Resnet before input to the LSTM auto-encoder. Here, the output vector of the last full connection layer of Resnet50 is used. So every image is transformed into a 2048 dimension vector firstly. The Resnet50 is also fine-tuned using the training data.
The auto-encoder is well trained according to the following learning curve.
The movement of cows in consecutive screens is vectorized by the model. The changes of colors indicate different movements of cows.
Dimension reduction on the learned vectors according to different perplexity values.
The labels of differnet colors are automatically generated by various clustering algorithms. Differnet labels correspond to differnet movements of cows.
The second clustering of Ward column is used for analyzing the labels and their corresponding images. It can be seen that differnet labels correspond to different movements of cows.
The summary of movements can be listed as follows:
label | movement |
---|---|
0 | head and limbs |
1 | head and belly (including cow-birth) |
2 | head |