hongluzhou / composer

Compositional Reasoning of Group Activity in Videos with Keypoint-Only Modality
31 stars 5 forks source link

Unable to reproduce the results #11

Open sanshuiii opened 12 months ago

sanshuiii commented 12 months ago
We tried to reproduce the results on Original Volleyball Dataset, but failed. We use the 2 stage training strategy as mentioned in the readme, and run it twice with the following result: No. stage 1 acc stage 2 acc
paper 93.7%
1 Test Prec@1: 93.044 % Test Prec@1: 93.119 %
2 Test Prec@1: 93.044 % Test Prec@1: 92.595 %

I am wondering if we are using the wrong configs? Since at our second trial, the 2nd stage acc even decreases. The configs and scripts used are as follows:

python main.py train --mode train --cfg configs/volleyball_stage_1.yml
python main.py train --mode train --cfg configs/volleyball_stage_2.yml --load_pretrained 1 --checkpoint checkpoints/xxxxxxxxx.pth

stage 1

exp_name: composer_vd_original

# -- Dataset settings
dataset_name: volleyball
dataset_dir: /home/guangyi.chen/workspace/yifan/composer/volleyball/volleyball
olympic_split: False
ball_trajectory_use: True
joints_folder_name: joints
tracklets_file_name: tracks_normalized.pkl
person_action_label_file_name: tracks_normalized_with_person_action_label.pkl
ball_trajectory_folder_name: volleyball_ball_annotation

horizontal_flip_augment: True
horizontal_flip_augment_purturb: True
horizontal_move_augment: True
horizontal_move_augment_purturb: True
vertical_move_augment: True
vertical_move_augment_purturb: True
agent_dropout_augment: True

image_h: 720
image_w: 1280
num_classes: 8
num_person_action_classes: 10
frame_start_idx: 5
frame_end_idx: 14
frame_sampling: 1
N: 12 
J: 17
T: 10
recollect_stats_train: True

# -- Training settings
seed: -1
batch_size: 256
num_epochs: 40
num_workers: -1
optimizer: 'adam'
learning_rate: 0.0005
weight_decay: 0.001

# -- Learning objective settings
loss_coe_fine: 1
loss_coe_mid: 1
loss_coe_coarse: 1
loss_coe_group: 1
loss_coe_last_TNT: 3
loss_coe_person: 1
use_group_activity_weights: True
use_person_action_weights: True

# -- Contrastive cluster assignment
nmb_prototypes: 100
temperature: 0.1
sinkhorn_iterations: 3
loss_coe_constrastive_clustering: 1

# -- Model settings
model_type: composer
group_person_frame_idx: 5
joint_initial_feat_dim: 8
joint2person_feat_dim: 2
num_gcn_layers: 3
max_num_tokens: 10
max_times_embed: 100
time_position_embedding_type: absolute_learned_1D 
max_image_positions_h: 1000
max_image_positions_w: 1500
image_position_embedding_type: learned_fourier_2D
# ------ Multiscale Transformer settings
projection_batchnorm: False
projection_dropout: 0
TNT_hidden_dim: 256
TNT_n_layers: 2
innerTx_nhead: 2 
innerTx_dim_feedforward: 1024
innerTx_dropout: 0.5
innerTx_activation: relu 
middleTx_nhead: 8
middleTx_dim_feedforward: 1024
middleTx_dropout: 0.2
middleTx_activation: relu 
outerTx_nhead: 2
outerTx_dim_feedforward: 1024
outerTx_dropout: 0.2
outerTx_activation: relu 
groupTx_nhead: 2
groupTx_dim_feedforward: 1024
groupTx_dropout: 0
groupTx_activation: relu 
# ------ Final classifier settings
classifier_use_batchnorm: False
classifier_dropout: 0

# -- Runtime settings
gpu:
  - 0
  - 1
  - 2
  - 3
#   - 4
#   - 5
#   - 6
#   - 7
dev: 0

# -- Output settings
checkpoint_dir: ./checkpoints/
log_dir: ./logs/

stage 2

exp_name: composer_vd_original

# -- Dataset settings
dataset_name: volleyball
dataset_dir: /home/guangyi.chen/workspace/yifan/composer/volleyball/volleyball
olympic_split: False
ball_trajectory_use: True
joints_folder_name: joints
tracklets_file_name: tracks_normalized.pkl
person_action_label_file_name: tracks_normalized_with_person_action_label.pkl
ball_trajectory_folder_name: volleyball_ball_annotation

horizontal_flip_augment: True
horizontal_flip_augment_purturb: True
horizontal_move_augment: True
horizontal_move_augment_purturb: True
vertical_move_augment: True
vertical_move_augment_purturb: True
agent_dropout_augment: True

image_h: 720
image_w: 1280
num_classes: 8
num_person_action_classes: 10
frame_start_idx: 5
frame_end_idx: 14
frame_sampling: 1
N: 12 
J: 17
T: 10
recollect_stats_train: False

# -- Training settings
seed: -1
batch_size: 256
num_epochs: 5
num_workers: -1
optimizer: 'adam'
learning_rate: 0.0001
weight_decay: 0.001

# -- Learning objective settings
loss_coe_fine: 1
loss_coe_mid: 1
loss_coe_coarse: 1
loss_coe_group: 1
loss_coe_last_TNT: 3
loss_coe_person: 1
use_group_activity_weights: True
use_person_action_weights: True

# -- Contrastive cluster assignment
nmb_prototypes: 100
temperature: 0.1
sinkhorn_iterations: 3
loss_coe_constrastive_clustering: 1

# -- Model settings
model_type: composer
group_person_frame_idx: 5
joint_initial_feat_dim: 8
joint2person_feat_dim: 2
num_gcn_layers: 3
max_num_tokens: 10
max_times_embed: 100
time_position_embedding_type: absolute_learned_1D 
max_image_positions_h: 1000
max_image_positions_w: 1500
image_position_embedding_type: learned_fourier_2D
# ------ Multiscale Transformer settings
projection_batchnorm: False
projection_dropout: 0
TNT_hidden_dim: 256
TNT_n_layers: 2
innerTx_nhead: 2 
innerTx_dim_feedforward: 1024
innerTx_dropout: 0.5
innerTx_activation: relu 
middleTx_nhead: 8
middleTx_dim_feedforward: 1024
middleTx_dropout: 0.2
middleTx_activation: relu 
outerTx_nhead: 2
outerTx_dim_feedforward: 1024
outerTx_dropout: 0.2
outerTx_activation: relu 
groupTx_nhead: 2
groupTx_dim_feedforward: 1024
groupTx_dropout: 0
groupTx_activation: relu 
# ------ Final classifier settings
classifier_use_batchnorm: False
classifier_dropout: 0

# -- Runtime settings
gpu:
  - 0
  - 1
  - 2
  - 3
#   - 4
#   - 5
#   - 6
#   - 7
dev: 0

# -- Output settings
checkpoint_dir: ./checkpoints/
log_dir: ./logs/
JinZhangYu commented 9 months ago

I also meet with the same problem. Here are the results I reproduced. image

But I at least understand what your problem is. You are using 4 GPUs, but in default .yaml it is 1 GPU. That means you multiply the batch size by 4 but you do not change the learning rate.

ztstctc commented 6 days ago

我们尝试在 Original Volleyball Dataset 上重现结果,但失败了。我们使用自述文件中提到的 2 阶段训练策略,并运行两次,结果如下:

不。 第一阶段 累积 第 2 阶段 ACC 纸 93.7% 1 测试Prec@1:93.044 % 测试 Prec@1: 93.119 % 2 测试Prec@1:93.044 % 测试 Prec@1: 92.595 % 我想知道我们是否使用了错误的配置?因为在我们的第二次试验中,第二阶段的 acc 甚至下降了。使用的配置和脚本如下:

python main.py train --mode train --cfg configs/volleyball_stage_1.yml
python main.py train --mode train --cfg configs/volleyball_stage_2.yml --load_pretrained 1 --checkpoint checkpoints/xxxxxxxxx.pth

第一阶段

exp_name: composer_vd_original

# -- Dataset settings
dataset_name: volleyball
dataset_dir: /home/guangyi.chen/workspace/yifan/composer/volleyball/volleyball
olympic_split: False
ball_trajectory_use: True
joints_folder_name: joints
tracklets_file_name: tracks_normalized.pkl
person_action_label_file_name: tracks_normalized_with_person_action_label.pkl
ball_trajectory_folder_name: volleyball_ball_annotation

horizontal_flip_augment: True
horizontal_flip_augment_purturb: True
horizontal_move_augment: True
horizontal_move_augment_purturb: True
vertical_move_augment: True
vertical_move_augment_purturb: True
agent_dropout_augment: True

image_h: 720
image_w: 1280
num_classes: 8
num_person_action_classes: 10
frame_start_idx: 5
frame_end_idx: 14
frame_sampling: 1
N: 12 
J: 17
T: 10
recollect_stats_train: True

# -- Training settings
seed: -1
batch_size: 256
num_epochs: 40
num_workers: -1
optimizer: 'adam'
learning_rate: 0.0005
weight_decay: 0.001

# -- Learning objective settings
loss_coe_fine: 1
loss_coe_mid: 1
loss_coe_coarse: 1
loss_coe_group: 1
loss_coe_last_TNT: 3
loss_coe_person: 1
use_group_activity_weights: True
use_person_action_weights: True

# -- Contrastive cluster assignment
nmb_prototypes: 100
temperature: 0.1
sinkhorn_iterations: 3
loss_coe_constrastive_clustering: 1

# -- Model settings
model_type: composer
group_person_frame_idx: 5
joint_initial_feat_dim: 8
joint2person_feat_dim: 2
num_gcn_layers: 3
max_num_tokens: 10
max_times_embed: 100
time_position_embedding_type: absolute_learned_1D 
max_image_positions_h: 1000
max_image_positions_w: 1500
image_position_embedding_type: learned_fourier_2D
# ------ Multiscale Transformer settings
projection_batchnorm: False
projection_dropout: 0
TNT_hidden_dim: 256
TNT_n_layers: 2
innerTx_nhead: 2 
innerTx_dim_feedforward: 1024
innerTx_dropout: 0.5
innerTx_activation: relu 
middleTx_nhead: 8
middleTx_dim_feedforward: 1024
middleTx_dropout: 0.2
middleTx_activation: relu 
outerTx_nhead: 2
outerTx_dim_feedforward: 1024
outerTx_dropout: 0.2
outerTx_activation: relu 
groupTx_nhead: 2
groupTx_dim_feedforward: 1024
groupTx_dropout: 0
groupTx_activation: relu 
# ------ Final classifier settings
classifier_use_batchnorm: False
classifier_dropout: 0

# -- Runtime settings
gpu:
  - 0
  - 1
  - 2
  - 3
#   - 4
#   - 5
#   - 6
#   - 7
dev: 0

# -- Output settings
checkpoint_dir: ./checkpoints/
log_dir: ./logs/

第 2 阶段

exp_name: composer_vd_original

# -- Dataset settings
dataset_name: volleyball
dataset_dir: /home/guangyi.chen/workspace/yifan/composer/volleyball/volleyball
olympic_split: False
ball_trajectory_use: True
joints_folder_name: joints
tracklets_file_name: tracks_normalized.pkl
person_action_label_file_name: tracks_normalized_with_person_action_label.pkl
ball_trajectory_folder_name: volleyball_ball_annotation

horizontal_flip_augment: True
horizontal_flip_augment_purturb: True
horizontal_move_augment: True
horizontal_move_augment_purturb: True
vertical_move_augment: True
vertical_move_augment_purturb: True
agent_dropout_augment: True

image_h: 720
image_w: 1280
num_classes: 8
num_person_action_classes: 10
frame_start_idx: 5
frame_end_idx: 14
frame_sampling: 1
N: 12 
J: 17
T: 10
recollect_stats_train: False

# -- Training settings
seed: -1
batch_size: 256
num_epochs: 5
num_workers: -1
optimizer: 'adam'
learning_rate: 0.0001
weight_decay: 0.001

# -- Learning objective settings
loss_coe_fine: 1
loss_coe_mid: 1
loss_coe_coarse: 1
loss_coe_group: 1
loss_coe_last_TNT: 3
loss_coe_person: 1
use_group_activity_weights: True
use_person_action_weights: True

# -- Contrastive cluster assignment
nmb_prototypes: 100
temperature: 0.1
sinkhorn_iterations: 3
loss_coe_constrastive_clustering: 1

# -- Model settings
model_type: composer
group_person_frame_idx: 5
joint_initial_feat_dim: 8
joint2person_feat_dim: 2
num_gcn_layers: 3
max_num_tokens: 10
max_times_embed: 100
time_position_embedding_type: absolute_learned_1D 
max_image_positions_h: 1000
max_image_positions_w: 1500
image_position_embedding_type: learned_fourier_2D
# ------ Multiscale Transformer settings
projection_batchnorm: False
projection_dropout: 0
TNT_hidden_dim: 256
TNT_n_layers: 2
innerTx_nhead: 2 
innerTx_dim_feedforward: 1024
innerTx_dropout: 0.5
innerTx_activation: relu 
middleTx_nhead: 8
middleTx_dim_feedforward: 1024
middleTx_dropout: 0.2
middleTx_activation: relu 
outerTx_nhead: 2
outerTx_dim_feedforward: 1024
outerTx_dropout: 0.2
outerTx_activation: relu 
groupTx_nhead: 2
groupTx_dim_feedforward: 1024
groupTx_dropout: 0
groupTx_activation: relu 
# ------ Final classifier settings
classifier_use_batchnorm: False
classifier_dropout: 0

# -- Runtime settings
gpu:
  - 0
  - 1
  - 2
  - 3
#   - 4
#   - 5
#   - 6
#   - 7
dev: 0

# -- Output settings
checkpoint_dir: ./checkpoints/
log_dir: ./logs/

Why is it that when I set a gpu 0 to gpu 0, 1, it reports an error with a NAN value? And I have no way to run it on gpu 1, only gpu 0.