mit-han-lab / temporal-shift-module

[ICCV 2019] TSM: Temporal Shift Module for Efficient Video Understanding
https://arxiv.org/abs/1811.08383
MIT License
2.07k stars 417 forks source link

Gesture image of the online demo #142

Closed czqInNanjing closed 4 years ago

czqInNanjing commented 4 years ago

Hi, I have succeeded to deploy the online demo on my tx2. But I found some gestures not easy to recognized. I am not sure whether it is because my gesture is wrong.

I wonder what dataset is used to train the model used in the demo? I wonder if you could provide some images(sign) of the gesture. Thank you very much!

Nauman007 commented 4 years ago

you can try the gestures they used in their online demo for cross check...

czqInNanjing commented 4 years ago

the dataset is from here: https://20bn.com/datasets/jester/v1

NB-Xie commented 3 years ago

the dataset is from here: https://20bn.com/datasets/jester/v1

hello there! we are working on the same issue, may I know:

  1. did you use the "mobilenet_v2_tsm.py" they provided in demo to train (which includes the buffer), or use the TSN models they provided for training?
  2. and did you change the bi-direction to uni-direction shift in training?

appreciate!

Nauman007 commented 3 years ago

hi, I used the TSN models they provided in the models, specifically I used MobileNet variant as I wanted a low cost model. secondly I did change bi-direction to uni-direction shift while training.

NB-Xie commented 3 years ago

hi, I used the TSN models they provided in the models, specifically I used MobileNet variant as I wanted a low cost model. secondly I did change bi-direction to uni-direction shift while training.

Hi, really appreciate your answer! And may I know:

  1. did you use the "mobilenet_v2_tsm.py(which includes the buffer)" they provided in demo in the online test?
  2. if you use the bi-direction in training but uni-direction in testing, will the result be heavily affected?

I tried the TSN in training (while I didn't change bi-direction)(num_seg = 8), and loaded the trained state_dict into mobilenet_v2 in mobilenet_v2_tsm.py, and then modified and applied the "main.py " in /online_demo .

it turned out that almost all gestures were classified as "no gesture" or "doing other things"

@Nauman007 @czqInNanjing thank you

NB-Xie commented 3 years ago

and they met the similar issue: https://github.com/mit-han-lab/temporal-shift-module/issues/39#issuecomment-820022829

Nauman007 commented 3 years ago

Hi, 1- No I didn't use the .py file u mentioned, instead I made my own for online demo as I didn't have experience working with onnx or tvm. 2- I had tested the same scenario u mentioned, in my case results were not heavily affected. But it did improve results when I trained the model with Uni-direction instead of bi-direction.

NB-Xie commented 3 years ago

Hi, 1- No I didn't use the .py file u mentioned, instead I made my own for online demo as I didn't have experience working with onnx or tvm. 2- I had tested the same scenario u mentioned, in my case results were not heavily affected. But it did improve results when I trained the model with Uni-direction instead of bi-direction.

Thank you for replying me! And how did you set your online demo? I'm curious that is the input size set to (1, 3, 224, 224) as frame by frame (which is done in their online_demo), or set to (num_seg, 3, 224, 224) as did in the traing phase (e.g. num_seg =8)?

andrewwang0612 commented 1 year ago

hello there! we are working on the same issue, may I know:

Sorry to bother you!! Did you already know the input size set to (1, 3, 224, 224) as frame by frame (which is done in their online_demo), or set to (num_seg, 3, 224, 224) as did in the traing phase (e.g. num_seg =8)?

appreciate!!