liujf69 / TD-GCN-Gesture

[TMM 2024] Implementation of the paper “Temporal Decoupling Graph Convolutional Network for Skeleton-based Gesture Recognition”.
https://ieeexplore.ieee.org/document/10113233
59 stars 7 forks source link

Train model on 2D Pose Data #11

Closed snalyami closed 3 weeks ago

snalyami commented 2 months ago

Hi Many thanks for your awesome work! I'm wondering if it would be possible to apply the model on 2D pose data estimated from videos using HRnet for example. Do you think this would be possible and how would you recommend to process the data.

liujf69 commented 2 months ago

Generally, the model can be used for 2D Pose obtained by HR-Net. You need to modify the following key modules:

  1. Define the adjacency matrix of skeleton graph.
    
    # Here, I have provided one example. 
    # In fact, the number of bones obtained from the COCO dataset is more than 16, 
    # but I have only selected 16 of them here.

import sys import numpy as np

sys.path.extend(['../']) from graph import tools

num_node = 17 self_link = [(i, i) for i in range(num_node)] inward_ori_index = [(1, 6), (2, 1), (3, 1), (6, 7), (7, 1), (8, 6), (9, 7), (10, 8), (11, 9), (12, 6), (13, 7), (14, 12), (15, 13), (16, 14), (17, 15), (13, 12)] inward = [(i - 1, j - 1) for (i, j) in inward_ori_index] outward = [(j, i) for (i, j) in inward] neighbor = inward + outward

class Graph: def init(self, labeling_mode='spatial'): self.num_node = num_node self.self_link = self_link self.inward = inward self.outward = outward self.neighbor = neighbor self.A = self.get_adjacency_matrix(labeling_mode)

def get_adjacency_matrix(self, labeling_mode=None):
    if labeling_mode is None:
        return self.A
    if labeling_mode == 'spatial':
        A = tools.get_spatial_graph(num_node, self_link, inward, outward)
    else:
        raise ValueError()
    return A
2. Change the input dimensions of the network.
```python
# line 157 in tdgcn.py
if in_channels == 2 or in_channels == 9: # change to 2, because the origin data is 2D-Pose.
    self.rel_channels = 8
    self.mid_channels = 16

# line 288 in tdgcn.py 
def __init__(self, num_class=60, num_point=25, num_person=2, graph=None, graph_args=dict(), in_channels=2,
                 drop_out=0, adaptive=True): # change in_channels into 2 due to the 2D Pose
  1. Modify the hyperparameter settings to ensure the model's performance in action recognition.
    # line 240 in tdgcn.py
    self.beta = nn.Parameter(torch.tensor(0.5)) # 1.0 1.4 2.0 # try 1.0 or 2.0 in action recognition
    self.gamma = nn.Parameter(torch.tensor(0.1))

    I hope my answer can help you!