Closed josebert2 closed 2 years ago
Thanks for your attention.
Once we have time, we will make a complete update for this repo, adding some new features (e.g., more advanced tech, TokenPose yaml, etc.). But sorry for that it may take a relatively long time (the author is quite busy recently).
FYI, the TokenPose is already released in: https://github.com/leeyegy/TokenPose. To apply this work to TokenPose, you may just change the training target from heatmap to the proposed method and adopt the corresponding loss function, which are technically easy to implement.
Thanks for the tip, i got the training process of TokenPose up and running! One disadvantage I encountered with TokenPose is, that the pretrained models from HRNet are unusable for the current network architecture. Apparently the pretrained model comes with a 'final_layer' which can't be found in the TokenPose architecture.
Ultimately I encountered another problem. Validation Loss is not decreasing at all while training the model. So i did some digging and found following passage in the corresponding paper for SimDR:
But for HRNet [29] and TokenPose [17], they have no extra independent modules as the decoder. To apply SimDR to them, we directly append an extra linear layer to the original HRNet and replace the MLP head of TokenPose with a linear layer.
As for now i just added a MLP-Head for both x and y positions at the end of the TokenPose model, but didn't remove anything. This allows me to train it, but as I said, the model does not learn. I am not entirely sure as to where those modifications on the model have to be done.
But in the end it isn't enough to simply adapt the loss function and change the training target
Hi there~ here comes the advice: Replace the mlp_head in the original TokenPose by two classifiers (for x- and y- coordinate classification, respectively). Both classifiers accept the output keypoint token embeddings as input and generate the corresponding predictions. Take classifier for x- coordinate classification as an example, it is a simple linear layer as follows: *nn.Linear(embedding_size, x_bordersplitting factor)* x_bordersplitting factor is the number of bins for x coordinate.
Hope these may help, and if there is still any question, plz let us know~:)
Thank you very much, you were really helpful :) I noticed that after 60-70 epochs the network starts overfitting on the MPII dataset: Up until that point, the PCK accuracy is around 80%. Should i just stop the training process at this point, or is there still room for improvement?
Maybe I did something wrong? So what I did was to replace the mlp_head with two classifiers, for both x and y coordinations, which gave me two linear layers: nn.Linear(embedding_size, x_border splitting factor) nn.Linear(embedding_size, y_border splitting factor)
I chose following values for the mlp heads:
Furthermore I adopted the files from the SimDR repository that are necessary for the calculations. (train/valid, loss, dataset, transforms, ...)
Here is my yaml maybe you can spot something off:
AUTO_RESUME: true
CUDNN:
BENCHMARK: true
DETERMINISTIC: false
ENABLED: true
DATA_DIR: ''
GPUS: (0,)
OUTPUT_DIR: 'output'
LOG_DIR: 'log'
WORKERS: 24
PRINT_FREQ: 100
DATASET:
COLOR_RGB: true
DATASET: mpii
DATA_FORMAT: jpg
FLIP: true
NUM_JOINTS_HALF_BODY: 8
PROB_HALF_BODY: -1.0
ROOT: '/home/data/mpii'
ROT_FACTOR: 30
SCALE_FACTOR: 0.25
TEST_SET: valid
TRAIN_SET: train
MODEL:
INIT_WEIGHTS: true
NAME: pose_tokenpose_s
NUM_JOINTS: 16
PRETRAINED: ''
HEAD_INPUT: 192
# TARGET_TYPE: gaussian
COORD_REPRESENTATION: 'simdr'
SIMDR_SPLIT_RATIO: 2.0
TRANSFORMER_DEPTH: 12
TRANSFORMER_HEADS: 8
TRANSFORMER_MLP_RATIO: 3
POS_EMBEDDING_TYPE: 'sine-full'
INIT: true
DIM: 192 # 4*4*3
PATCH_SIZE:
- 3
- 4
IMAGE_SIZE:
- 192
- 256
HEATMAP_SIZE:
- 48
- 64
SIGMA: 2
EXTRA:
PRETRAINED_LAYERS:
- 'conv1'
- 'bn1'
- 'conv2'
- 'bn2'
- 'layer1'
- 'transition1'
- 'stage2'
- 'transition2'
- 'stage3'
- 'transition3'
- 'stage4'
FINAL_CONV_KERNEL: 1
STAGE2:
NUM_MODULES: 1
NUM_BRANCHES: 2
BLOCK: BASIC
NUM_BLOCKS:
- 4
- 4
NUM_CHANNELS:
- 32
- 64
FUSE_METHOD: SUM
STAGE3:
NUM_MODULES: 4
NUM_BRANCHES: 3
BLOCK: BASIC
NUM_BLOCKS:
- 4
- 4
- 4
NUM_CHANNELS:
- 32
- 64
- 128
FUSE_METHOD: SUM
STAGE4:
NUM_MODULES: 3
NUM_BRANCHES: 4
BLOCK: BASIC
NUM_BLOCKS:
- 4
- 4
- 4
- 4
NUM_CHANNELS:
- 32
- 64
- 128
- 256
FUSE_METHOD: SUM
LOSS:
USE_TARGET_WEIGHT: true
TYPE: 'NMTNORMCritierion'
LABEL_SMOOTHING: 0.2
TRAIN:
BATCH_SIZE_PER_GPU: 16
SHUFFLE: true
BEGIN_EPOCH: 0
END_EPOCH: 100
OPTIMIZER: adam
LR: 0.0001
LR_FACTOR: 0.1
LR_STEP:
- 200
- 260
WD: 0.0001
GAMMA1: 0.99
GAMMA2: 0.0
MOMENTUM: 0.9
NESTEROV: false
TEST:
BATCH_SIZE_PER_GPU: 16
COCO_BBOX_FILE: '/data/dataset/COCO_2017/COCO_val2017_detections_AP_H_56_person.json'
BBOX_THRE: 1.0
IMAGE_THRE: 0.0
IN_VIS_THRE: 0.2
MODEL_FILE: ''
NMS_THRE: 1.0
OKS_THRE: 0.9
USE_GT_BBOX: true
FLIP_TEST: true
POST_PROCESS: true
SHIFT_HEATMAP: true
BLUR_KERNEL: 11
DEBUG:
DEBUG: true
SAVE_BATCH_IMAGES_GT: true
SAVE_BATCH_IMAGES_PRED: true
SAVE_HEATMAPS_GT: true
SAVE_HEATMAPS_PRED: true
It seems that you changed the original training settings which may be the reason to cause this drop. For example,
Hey, first of all, thanks for the amazing work!
I observed that the model/experiment files for the TokenPose implementation are missing, even though you present those results in your paper and in this repo. Can we hope for those files to be uploaded anytime soon? I was not able to include TokenPose in the present code base by myself.