open-mmlab / mmaction2

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
https://mmaction2.readthedocs.io
Apache License 2.0
4.28k stars 1.25k forks source link

Regarding the implementation of poseC3D considering both RGB and pose input #1221

Closed Jeba-create closed 2 years ago

kennymckormick commented 3 years ago

Hi, Jeba-create, the implementation of poseC3D considering both RGB and pose input has not been released for now.

Jeba-create commented 3 years ago

I have couple of question to ask

  1. I have tried to create a model for slowfast network (poseC3D with RGB and pose pathway) in a different script but it throws an error, the file is not included in the register_module. So if I want to modify the existing model what are the steps should i follow?
  2. The next thing I want to know is how should I use the data loader if I want to fed two modalities of input?

I am working on my own dataset. I have build the dataset separately using the following lines for both the pathways cfg_pose = Config.fromfile('open-mmlab/mmaction2-master/configs/skeleton/posec3d/slowonly_r50_u48_240e_iitDelhi_keypoint.py') datasets_pose = [build_dataset(cfg_pose.data.train)]

cfg_vid= Config.fromfile('open-mmlab/mmaction2-master/configs/recognition/slowfast/slowfast_r50_video_iitDelhi_rgbpose.py') datasets_vid = [build_dataset(cfg_vid.data.train)]

Also, I have modified the slowfast network according to my requirements.

Is it enough to build the dataset separately or it should be build unitedly before fed into the model for training?

Thanks in advance

Jeba-create commented 3 years ago

I have used your slowposeC3D model and its working good. Now I want to extend to multi-domain. Also, I have to complete the project in two weeks. So ,I have started implementing the code using your toolbox. I have few clarification that helps me a lot. There is a separate pipeline called video loader in case of a RGB input and pose loader for skeletal joints (heat maps). We have separate config file for both the modalities. Is it make sense or whether the data loader works if I integrate both the thing into a single config file and place the data in the form of list or dictionary.

kennymckormick commented 3 years ago

Hi, Jeba-create, if you want to implement the RGBPoseSlowFast in the PoseC3D paper on ur own:

  1. You need to create a new dataset, which provides samples consist of RGB videos and 2D skeletons (For example, video file path and skeleton in a single dictionary). (register it in DATASETS)
  2. You need to create components in the data pipeline to process such samples. (register it in PIPELINES)
  3. You need to create a two-stream backbone, which takes both RGB frames and heatmap volumes (maybe in a tuple) as input. (register it in BACKBONES)

Implementing RGBPoseSlowFast requires some effort, if you just need RGB+Pose-based predictions, you can fuse predictions of two individual streams directly.

Jeba-create commented 3 years ago

Thank you so much for your response.

rlleshi commented 3 years ago

Do you have a rough timeline for when RGBPoseSlowFast will be released?

kennymckormick commented 3 years ago

Do you have a rough timeline for when RGBPoseSlowFast will be released?

Sorry, we do not have a plan for releasing it now (until December I guess). We may release it in December or early next year.

Jeba-create commented 3 years ago

Hardly, I have coded the RGBPoseSlowFast network and started training the network. The datas from both the modalities are arranged in the same order to ensure same file is being accessed. The code is given below

`cfg_pose = Config.fromfile('open-mmlab/configs/skeleton/posec3d/slowonly_r50_dataset.py') datasets_pose_trn = [build_dataset(cfg_pose.data.train)] datasets_pose_val = [build_dataset(cfg_pose.data.val)]

cfg_vid= Config.fromfile('open-mmlab/configs/recognition/slowfast/slowfast_r50_dataset.py') datasets_vid_trn = [build_dataset(cfg_vid.data.train)] datasets_vid_val = [build_dataset(cfg_vid.data.val)]

train_loader_pose = torch.utils.data.DataLoader(datasets_pose_trn[0],shuffle = False, batch_size =2) train_loader_vid = torch.utils.data.DataLoader(datasets_vid_trn[0],shuffle = False, batch_size = 2)

val_loader_pose = torch.utils.data.DataLoader(datasets_pose_val[0],shuffle = False, batch_size = 1) val_loader_vid = torch.utils.data.DataLoader(datasets_vid_val[0],shuffle = False, batch_size = 1) print("val_loader_pose",len(val_loader_pose),val_loader_pose)

initialize the model

device = torch.device("cuda" if torch.cuda.is_available() else "cpu") net = RgbPoseSlowFastResNet50().to(device) criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(net.parameters(), lr=0.0002, momentum=0.9,weight_decay=0.0003) scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1) min_val_loss = np.inf for epoch in range(500): # loop over the dataset multiple times

train_loss = 0.0 total_trn,correct_trn=0,0 for input1, input2 in zip(train_loader_vid,train_loader_pose): data1,label1,data2,label2=torch.squeeze(input1['imgs'], 1).to(device), torch.squeeze(input1['label']).to(device),torch.squeeze(input2['imgs'], 1).to(device),torch.squeeze(input2['label']).to(device)

# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs =net(data1,data2)
loss = criterion(outputs, label1)
loss.backward()
optimizer.step()
scheduler.step()
# Calculate Loss
train_loss += loss.item()

#compute accuracy
_, predicted = torch.max(outputs, 1)
total_trn += label1.size(0)
correct_trn += (predicted == label1).sum().item()

val_loss = 0.0 total_val,correct_val=0,0 for input1, input2 in zip(val_loader_vid,val_loader_pose): data1,label1,data2,label2=torch.squeeze(input1['imgs'], 1).to(device), input1['label'].to(device),torch.squeeze(input2['imgs'], 1).to(device),input2['label'].to(device)

# forward 
outputs =net(data1,data2)
loss = criterion(outputs, label1)
val_loss += loss.item()

#compute accuracy
_, predicted = torch.max(outputs, 1)
total_val += label1.size(0)
correct_val += (predicted == label1).sum().item()

print(f'Epoch {epoch+1} \t\t Train Loss: {train_loss / len(train_loader_vid)} \t\t Val Loss: {val_loss / len(val_loader_vid)} \t\t Train acc: {correct_trn /total_trn} \t\t Val acc: {correct_val /total_val}') if min_val_loss > val_loss: print(f'Validation Loss Decreased({min_val_loss:.6f}--->{val_loss:.6f}) \t Saving The Model') min_val_loss = val_loss filewrt='savedmodel' +str(epoch)+'.pth'

Saving State Dict

torch.save(net.state_dict(),filewrt)`

##################### When I use slowposeC3D model, I could see there is a consistent decrease in training loss for every epochs. But, when I train my model, the training loss is bouncing back and forth and is not decreasing consistently as shown below. image

Could you please help me to improve the performance of the model?

kennymckormick commented 3 years ago

Hi, Jeba, In the paper, it says you need to first train two streams seperately, and jointly finetune them to achieve better performance. Training RGBPoseSlowFast from scratch may not lead to good results.

Jeba-create commented 3 years ago

Ok thank you so much for this suggestion. I have a question please, since I have already trained the slowposeC3D model, Is it possible to use that weights in the pose pathway?

kennymckormick commented 3 years ago

Yes, you should use the weight of slowposeC3D to initialize the Pose Pathway.

housong404 commented 2 years ago

Sorry, we do not have a plan for releasing it now (until December I guess). We may release it in December or early next year.

Hello, I want to use the RGBPose-SlowFast part,but I didn’t find it anywhere else. I tried to build it but failed. Do you have a rough timeline for when RGBPoseSlowFast will be released?. Thank you

housong404 commented 2 years ago

When I use slowposeC3D model, I could see there is a consistent decrease in training loss for every epochs. But, when I train my model, the training loss is bouncing back and forth and is not decreasing consistently as shown below.

Hello, I want to use the RGBPose-SlowFast part,but I didn’t find it anywhere else. I tried to build it but failed. Can I refer to how you built the RgbPoseSlowFastResNet50 network? Thank you

kennymckormick commented 2 years ago

When I use slowposeC3D model, I could see there is a consistent decrease in training loss for every epochs. But, when I train my model, the training loss is bouncing back and forth and is not decreasing consistently as shown below.

Hello, I want to use the RGBPose-SlowFast part,but I didn’t find it anywhere else. I tried to build it but failed. Can I refer to how you built the RgbPoseSlowFastResNet50 network? Thank you

Hi, the rest part of the PoseC3D project (RGBPose-SlowFast & PoseC3D + Kinetics) will be released after the paper gets published.

housong404 commented 2 years ago

When I use slowposeC3D model, I could see there is a consistent decrease in training loss for every epochs. But, when I train my model, the training loss is bouncing back and forth and is not decreasing consistently as shown below.

Hello, I want to use the RGBPose-SlowFast part,but I didn’t find it anywhere else. I tried to build it but failed. Can I refer to how you built the RgbPoseSlowFastResNet50 network? Thank you

Hi, the rest part of the PoseC3D project (RGBPose-SlowFast & PoseC3D + Kinetics) will be released after the paper gets published.

Thank you very much for your reply, I will continue to try to build the RGBPose-SlowFast model refer to your paper. Thank you

Xinxinatg commented 2 years ago

Hi, did you try to use ntu-rgbd dataset to train the model with rgb and skeleton key points data?

Xinxinatg commented 2 years ago

When I use slowposeC3D model, I could see there is a consistent decrease in training loss for every epochs. But, when I train my model, the training loss is bouncing back and forth and is not decreasing consistently as shown below.

Hello, I want to use the RGBPose-SlowFast part,but I didn’t find it anywhere else. I tried to build it but failed. Can I refer to how you built the RgbPoseSlowFastResNet50 network? Thank you

Hi have you got any luck in implementing the model with rgb and skeleton data

Jeba-create commented 1 year ago

Sorry for the late response. I haven't tried it with the NTU-RGB-D dataset. But, I've replicated the model of my own. Luckily, I got a good performance. Thank you

kennymckormick commented 1 year ago

Now you can find the official implementation of RGBPoseConv3D in PYSKL: https://github.com/kennymckormick/pyskl