Mmskeleton demo(mp4) does not output action classification results

suarezyan commented 5 years ago

Dear, Marvel at your project,We would like to ask you for three questions, 1.The old ST-GCN demo outputs the result of the action,such as drinking,running. But the mmskeleton demo(mp4) does not output action classification results. Can the results be shown in the video? 2.Does mmskeleton demo use st-gcn? Where is the specific processing code? 3.We found the inference function in pose_demo.py, Which byte segment of res variable represents the result of action classification? Here are the results of our debug：

(Pdb) p res {'joint_preds': array([[[312.38754 , 232.0685 ], [312.38754 , 219.4858 ], [306.0962 , 219.4858 ], [262.0567 , 225.77715 ], [274.6394 , 225.77715 ], [255.76535 , 294.98206 ], [243.18263 , 294.98206 ], [218.01721 , 370.47833 ], [155.10365 , 332.7302 ], [280.9308 , 370.47833 ], [167.68637 , 439.68323 ], [161.39502 , 496.30545 ], [199.14314 , 496.30545 ], [129.93823 , 622.13257 ], [268.34805 , 622.13257 ], [ 48.150608, 741.66833 ], [224.30856 , 760.54236 ]]], dtype=float32), 'joint_scores': array([[[0.9012919 ], [0.8540952 ], [0.9605977 ], [0.63796365], [0.94471455], [0.69215775], [0.90001863], [0.59691495], [0.940477 ], [0.8861943 ], [0.90650606], [0.69944346], [0.74769926], [0.86359084], [0.92158854], [0.8741871 ], [0.8856789 ]]], dtype=float32), 'meta': {'scale': tensor([[3.0199, 4.0265]]), 'rotation': tensor([0.]), 'center': tensor([[170.8320, 486.8684]]), 'score': tensor([0.9994])}, 'has_return': True, 'person_bbox': array([[ 16.82634 , 164.75099 , 324.83774 , 808.98584 , 0.9994229]], dtype=float32)}

Could you prompt the meaning of the above byte segments? Is there any document?

We are looking forward to your resolution.

dmvictor commented 5 years ago

Hi, I'm trying this repo as well. I have the same question. I found there is a recognition_demo.py over here. I add the following snippet to pose_demo.yaml under ./configs/pose_estimation/ and generate recognition_demo.yaml to run action recognition.

recognition_cfg:
    checkpoint_file: mmskeleton://st_gcn/kinetics-skeleton
    type: "models.backbones.ST_GCN_18"
    model_cfg:      
      in_channels: 3
      num_class: 400
      edge_importance_weighting: True
      graph_cfg:
        layout: "openpose"
        strategy: "spatial"

I have similar result as yours for pose inference.

{'joint_preds': array([[[106.735275,  82.21731 ],
          [106.735275,  77.04267 ],
          [105.01039 ,  77.04267 ],
          [ 80.862076,  77.04267 ],
          [ 89.48647 ,  78.76755 ],
          [ 48.08936 ,  97.74123 ],
          [ 87.7616  , 104.64075 ],
          [ 17.04152 , 118.43979 ],
          [130.88359 , 104.64075 ],
          [ 13.59176 , 121.88955 ],
          [118.80943 ,  80.49243 ],
          [ 32.565437, 183.98523 ],
          [ 67.063034, 185.7101  ],
          [ 41.18984 , 202.95891 ],
          [122.25919 , 218.48282 ],
          [ 27.390799, 227.10722 ],
          [122.25919 , 225.38234 ]]], dtype=float32),
  'joint_scores': array([[[0.89862955],
          [0.8472556 ],
          [0.9685097 ],
          [0.6026411 ],
          [0.932783  ],
          [0.8121992 ],
          [0.7233856 ],
          [0.5849051 ],
          [0.8421459 ],
          [0.6456639 ],
          [0.8000812 ],
          [0.6833744 ],
          [0.62968564],
          [0.66435623],
          [0.8540314 ],
          [0.5832541 ],
          [0.03457919]]], dtype=float32),
  'meta': {'scale': tensor([[0.8279, 1.1039]]),
   'rotation': tensor([0.]),
   'center': tensor([[ 71.3752, 143.4505]]),
   'score': tensor([0.9928])},
  'has_return': True,
  'person_bbox': array([[  5.1398487 ,  57.465942  , 137.61063   , 229.43515   ,
            0.99280256]], dtype=float32),
  'frame_index': 183},

In my understanding of the code, we need to parse the results above and input the necessary part to the recognizer, and gererates the output video by ourself. If you have some progress, maybe we could have a discussion. :)

DylanMaeng commented 4 years ago

@suarezyan @dmvictor Could you prompt the meaning of the above byte segments?How to recognize the video? Please let me know.

TalBarami commented 4 years ago

@yysijie , Could you provide instructions on how to classify actions? As far as I understand, the provided output only shows the location of each person.

SahadevPoudel commented 4 years ago

Anyone solved this issue? if yes, then please let me know.

aoluming commented 4 years ago

Hi, I'm trying this repo as well. I have the same question. I found there is a recognition_demo.py over here. I add the following snippet to pose_demo.yaml under ./configs/pose_estimation/ and generate recognition_demo.yaml to run action recognition.

recognition_cfg:
    checkpoint_file: mmskeleton://st_gcn/kinetics-skeleton
    type: "models.backbones.ST_GCN_18"
    model_cfg:      
      in_channels: 3
      num_class: 400
      edge_importance_weighting: True
      graph_cfg:
        layout: "openpose"
        strategy: "spatial"

I have similar result as yours for pose inference.

{'joint_preds': array([[[106.735275,  82.21731 ],
          [106.735275,  77.04267 ],
          [105.01039 ,  77.04267 ],
          [ 80.862076,  77.04267 ],
          [ 89.48647 ,  78.76755 ],
          [ 48.08936 ,  97.74123 ],
          [ 87.7616  , 104.64075 ],
          [ 17.04152 , 118.43979 ],
          [130.88359 , 104.64075 ],
          [ 13.59176 , 121.88955 ],
          [118.80943 ,  80.49243 ],
          [ 32.565437, 183.98523 ],
          [ 67.063034, 185.7101  ],
          [ 41.18984 , 202.95891 ],
          [122.25919 , 218.48282 ],
          [ 27.390799, 227.10722 ],
          [122.25919 , 225.38234 ]]], dtype=float32),
  'joint_scores': array([[[0.89862955],
          [0.8472556 ],
          [0.9685097 ],
          [0.6026411 ],
          [0.932783  ],
          [0.8121992 ],
          [0.7233856 ],
          [0.5849051 ],
          [0.8421459 ],
          [0.6456639 ],
          [0.8000812 ],
          [0.6833744 ],
          [0.62968564],
          [0.66435623],
          [0.8540314 ],
          [0.5832541 ],
          [0.03457919]]], dtype=float32),
  'meta': {'scale': tensor([[0.8279, 1.1039]]),
   'rotation': tensor([0.]),
   'center': tensor([[ 71.3752, 143.4505]]),
   'score': tensor([0.9928])},
  'has_return': True,
  'person_bbox': array([[  5.1398487 ,  57.465942  , 137.61063   , 229.43515   ,
            0.99280256]], dtype=float32),
  'frame_index': 183},

In my understanding of the code, we need to parse the results above and input the necessary part to the recognizer, and gererates the output video by ourself. If you have some progress, maybe we could have a discussion. :)

Hi, are you still working on it? I think there's a lot of work to do. The pose inference ouput 17 nodes(you can count them:)). In the \mmskeleton\deprecated\origin_stgcn_repo\net\utils\graph.py, there is no alternative for the 17 nodes. That means, the old gcn used openpose api ,so the graph is implemented based on the 18 nodes(openpose api output). If you look into the demo of mmskeleton pose estimation, you can find out the mmskeleton lacks the center node. Thus, for the gcn, we have to redesign the sub-graph. If you have any progress, please discuss with me:).

lamezkua commented 4 years ago

Hi, I have the same issue too. I think we might have to retrain the last(s) layer(s) of the model to get 17 nodes instead of 18 because we are using now mmdetection instead of OpenPose (I have not found any trained model in order not to have to do any retrain). If anyone comes to any solution, I would appreciate your comments. Thanks! Luis

MaarufB commented 3 years ago

Hi guys, did you figure it out? Don't know how to use recognition_demo.yaml either. Appreciate your help if you could show me the pose_demo.yaml including the recognition. Thanks in advance.

open-mmlab / mmskeleton

Mmskeleton demo(mp4) does not output action classification results #250