scnuhealthy / Tensorflow_PersonLab

Tensorflow implementation of PersonLab (https://arxiv.org/abs/1803.08225)
83 stars 19 forks source link

Some confusion about config.py #6

Open KaiChen1998 opened 4 years ago

KaiChen1998 commented 4 years ago

First of all, thank you so much for providing such great code! I just don't understand some parameters you use in your code, especially in your data augmentation part.

Specificly, what does the TransformationParams.target_dist do, which has been set to a constant equal 0.8. I mean you have done random scale already, why do you still want to add a constant factor here?

Again thank you for your great code! I have learned a lot of staff from it.

class TransformationParams:
    target_dist = 0.8
    scale_prob = 1.
    scale_min = 0.8
    scale_max = 2.0
    max_rotate_degree = 30.
    center_perterb_max = 20.0
    flip_prob = 0.5
scnuhealthy commented 4 years ago

It means that if you want to train in single scale rather than multi-scale, what scale of the input image you want to set.

KaiChen1998 commented 4 years ago

@scnuhealthy Thank you for your reply! But I'm sorry I don't quite get your idea. In your code, this target_dist changes the parameters in the scale affine transformation matrix so even when I close data augmentation, the affine function will still change the original image. :joy:

scnuhealthy commented 4 years ago

Thanks for your question. target_dist is to control the resolution of input when training in single scale.(Training in a larger resolution usually results in better result.) If you close data augmentation and want to train in the original resolution of the input, just set target_dist=1.0.

KaiChen1998 commented 4 years ago

Thank you for your reply. But just to make it clear (you know, students' habits :joy: )

  1. So you mean target_dist is just kind of a scale factor, rgiht? But when I try to visulize your result when I close the data augmentation, I find the original image has also been cropped, because when you multiple the affine transformation matrix, the target_dist parameter makes the final column of your result matrix unequal with 0. It's something that you want or there are some mistakes here?

  2. Which way do you think is better during inferencing? Keep the ratio of image width and height and then pad, like what you have done, or simply resize the image to [401, 401]?

KaiChen1998 commented 4 years ago

@scnuhealthy BTW, is there any possibility we could talk on Wechat?

scnuhealthy commented 4 years ago

For two questions: 1 When visulizing, target_dist should be set 1.0. I forget to consider this parameter when coding demo.py.

2 Multi-scale testing will achieve better result. For example, if you train in scale S, test in 0.8S, 1.0S and 1.2*S, and then ensemble the three results, the performance will be better. (A common way for improving mAP in COCO dataset.)

KaiChen1998 commented 4 years ago
  1. Sorry I still don't get your point in setting this parameter. Why do you still need it after you do the data augmentation using affine matrix? (I think after using data augmentation it's not single scale training any more right?)

  2. Unfortunately it's not my point. I mean when you resize the original image to your network input size, there are two ways: keeping the ratio of height and width (长宽比) and then padding or resize the image directly without keeping that. Which one do you think is better?

scnuhealthy commented 4 years ago

Keeping the ratio throught padding is better during inference and I also use this strategy when training.