Open suarezyan opened 5 years ago
Hi, I'm trying this repo as well. I have the same question. I found there is a recognition_demo.py over here. I add the following snippet to pose_demo.yaml under ./configs/pose_estimation/ and generate recognition_demo.yaml to run action recognition.
recognition_cfg:
checkpoint_file: mmskeleton://st_gcn/kinetics-skeleton
type: "models.backbones.ST_GCN_18"
model_cfg:
in_channels: 3
num_class: 400
edge_importance_weighting: True
graph_cfg:
layout: "openpose"
strategy: "spatial"
I have similar result as yours for pose inference.
{'joint_preds': array([[[106.735275, 82.21731 ],
[106.735275, 77.04267 ],
[105.01039 , 77.04267 ],
[ 80.862076, 77.04267 ],
[ 89.48647 , 78.76755 ],
[ 48.08936 , 97.74123 ],
[ 87.7616 , 104.64075 ],
[ 17.04152 , 118.43979 ],
[130.88359 , 104.64075 ],
[ 13.59176 , 121.88955 ],
[118.80943 , 80.49243 ],
[ 32.565437, 183.98523 ],
[ 67.063034, 185.7101 ],
[ 41.18984 , 202.95891 ],
[122.25919 , 218.48282 ],
[ 27.390799, 227.10722 ],
[122.25919 , 225.38234 ]]], dtype=float32),
'joint_scores': array([[[0.89862955],
[0.8472556 ],
[0.9685097 ],
[0.6026411 ],
[0.932783 ],
[0.8121992 ],
[0.7233856 ],
[0.5849051 ],
[0.8421459 ],
[0.6456639 ],
[0.8000812 ],
[0.6833744 ],
[0.62968564],
[0.66435623],
[0.8540314 ],
[0.5832541 ],
[0.03457919]]], dtype=float32),
'meta': {'scale': tensor([[0.8279, 1.1039]]),
'rotation': tensor([0.]),
'center': tensor([[ 71.3752, 143.4505]]),
'score': tensor([0.9928])},
'has_return': True,
'person_bbox': array([[ 5.1398487 , 57.465942 , 137.61063 , 229.43515 ,
0.99280256]], dtype=float32),
'frame_index': 183},
In my understanding of the code, we need to parse the results above and input the necessary part to the recognizer, and gererates the output video by ourself. If you have some progress, maybe we could have a discussion. :)
@suarezyan @dmvictor Could you prompt the meaning of the above byte segments?How to recognize the video? Please let me know.
@yysijie , Could you provide instructions on how to classify actions? As far as I understand, the provided output only shows the location of each person.
Anyone solved this issue? if yes, then please let me know.
Hi, I'm trying this repo as well. I have the same question. I found there is a recognition_demo.py over here. I add the following snippet to pose_demo.yaml under ./configs/pose_estimation/ and generate recognition_demo.yaml to run action recognition.
recognition_cfg: checkpoint_file: mmskeleton://st_gcn/kinetics-skeleton type: "models.backbones.ST_GCN_18" model_cfg: in_channels: 3 num_class: 400 edge_importance_weighting: True graph_cfg: layout: "openpose" strategy: "spatial"
I have similar result as yours for pose inference.
{'joint_preds': array([[[106.735275, 82.21731 ], [106.735275, 77.04267 ], [105.01039 , 77.04267 ], [ 80.862076, 77.04267 ], [ 89.48647 , 78.76755 ], [ 48.08936 , 97.74123 ], [ 87.7616 , 104.64075 ], [ 17.04152 , 118.43979 ], [130.88359 , 104.64075 ], [ 13.59176 , 121.88955 ], [118.80943 , 80.49243 ], [ 32.565437, 183.98523 ], [ 67.063034, 185.7101 ], [ 41.18984 , 202.95891 ], [122.25919 , 218.48282 ], [ 27.390799, 227.10722 ], [122.25919 , 225.38234 ]]], dtype=float32), 'joint_scores': array([[[0.89862955], [0.8472556 ], [0.9685097 ], [0.6026411 ], [0.932783 ], [0.8121992 ], [0.7233856 ], [0.5849051 ], [0.8421459 ], [0.6456639 ], [0.8000812 ], [0.6833744 ], [0.62968564], [0.66435623], [0.8540314 ], [0.5832541 ], [0.03457919]]], dtype=float32), 'meta': {'scale': tensor([[0.8279, 1.1039]]), 'rotation': tensor([0.]), 'center': tensor([[ 71.3752, 143.4505]]), 'score': tensor([0.9928])}, 'has_return': True, 'person_bbox': array([[ 5.1398487 , 57.465942 , 137.61063 , 229.43515 , 0.99280256]], dtype=float32), 'frame_index': 183},
In my understanding of the code, we need to parse the results above and input the necessary part to the recognizer, and gererates the output video by ourself. If you have some progress, maybe we could have a discussion. :)
Hi, are you still working on it? I think there's a lot of work to do. The pose inference ouput 17 nodes(you can count them:)). In the \mmskeleton\deprecated\origin_stgcn_repo\net\utils\graph.py, there is no alternative for the 17 nodes. That means, the old gcn used openpose api ,so the graph is implemented based on the 18 nodes(openpose api output). If you look into the demo of mmskeleton pose estimation, you can find out the mmskeleton lacks the center node. Thus, for the gcn, we have to redesign the sub-graph. If you have any progress, please discuss with me:).
Hi, I have the same issue too. I think we might have to retrain the last(s) layer(s) of the model to get 17 nodes instead of 18 because we are using now mmdetection instead of OpenPose (I have not found any trained model in order not to have to do any retrain). If anyone comes to any solution, I would appreciate your comments. Thanks! Luis
Hi guys, did you figure it out? Don't know how to use recognition_demo.yaml either. Appreciate your help if you could show me the pose_demo.yaml including the recognition. Thanks in advance.
Dear, Marvel at your project,We would like to ask you for three questions, 1.The old ST-GCN demo outputs the result of the action,such as drinking,running. But the mmskeleton demo(mp4) does not output action classification results. Can the results be shown in the video? 2.Does mmskeleton demo use st-gcn? Where is the specific processing code? 3.We found the inference function in pose_demo.py, Which byte segment of res variable represents the result of action classification? Here are the results of our debug:
(Pdb) p res {'joint_preds': array([[[312.38754 , 232.0685 ], [312.38754 , 219.4858 ], [306.0962 , 219.4858 ], [262.0567 , 225.77715 ], [274.6394 , 225.77715 ], [255.76535 , 294.98206 ], [243.18263 , 294.98206 ], [218.01721 , 370.47833 ], [155.10365 , 332.7302 ], [280.9308 , 370.47833 ], [167.68637 , 439.68323 ], [161.39502 , 496.30545 ], [199.14314 , 496.30545 ], [129.93823 , 622.13257 ], [268.34805 , 622.13257 ], [ 48.150608, 741.66833 ], [224.30856 , 760.54236 ]]], dtype=float32), 'joint_scores': array([[[0.9012919 ], [0.8540952 ], [0.9605977 ], [0.63796365], [0.94471455], [0.69215775], [0.90001863], [0.59691495], [0.940477 ], [0.8861943 ], [0.90650606], [0.69944346], [0.74769926], [0.86359084], [0.92158854], [0.8741871 ], [0.8856789 ]]], dtype=float32), 'meta': {'scale': tensor([[3.0199, 4.0265]]), 'rotation': tensor([0.]), 'center': tensor([[170.8320, 486.8684]]), 'score': tensor([0.9994])}, 'has_return': True, 'person_bbox': array([[ 16.82634 , 164.75099 , 324.83774 , 808.98584 , 0.9994229]], dtype=float32)}
Could you prompt the meaning of the above byte segments? Is there any document?
We are looking forward to your resolution.