Open KaiChen1998 opened 4 years ago
It means that if you want to train in single scale rather than multi-scale, what scale of the input image you want to set.
@scnuhealthy Thank you for your reply! But I'm sorry I don't quite get your idea.
In your code, this target_dist
changes the parameters in the scale
affine transformation matrix so even when I close data augmentation, the affine
function will still change the original image. :joy:
Thanks for your question. target_dist is to control the resolution of input when training in single scale.(Training in a larger resolution usually results in better result.) If you close data augmentation and want to train in the original resolution of the input, just set target_dist=1.0.
Thank you for your reply. But just to make it clear (you know, students' habits :joy: )
So you mean target_dist is just kind of a scale factor, rgiht?
But when I try to visulize your result when I close the data augmentation, I find the original image has also been cropped, because when you multiple the affine transformation matrix, the target_dist
parameter makes the final column of your result matrix unequal with 0. It's something that you want or there are some mistakes here?
Which way do you think is better during inferencing? Keep the ratio of image width and height and then pad, like what you have done, or simply resize the image to [401, 401]?
@scnuhealthy BTW, is there any possibility we could talk on Wechat?
For two questions: 1 When visulizing, target_dist should be set 1.0. I forget to consider this parameter when coding demo.py.
2 Multi-scale testing will achieve better result. For example, if you train in scale S, test in 0.8S, 1.0S and 1.2*S, and then ensemble the three results, the performance will be better. (A common way for improving mAP in COCO dataset.)
Sorry I still don't get your point in setting this parameter. Why do you still need it after you do the data augmentation using affine matrix? (I think after using data augmentation it's not single scale training any more right?)
Unfortunately it's not my point. I mean when you resize the original image to your network input size, there are two ways: keeping the ratio of height and width (长宽比) and then padding or resize the image directly without keeping that. Which one do you think is better?
Keeping the ratio throught padding is better during inference and I also use this strategy when training.
First of all, thank you so much for providing such great code! I just don't understand some parameters you use in your code, especially in your data augmentation part.
Specificly, what does the
TransformationParams.target_dist
do, which has been set to a constant equal 0.8. I mean you have done random scale already, why do you still want to add a constant factor here?Again thank you for your great code! I have learned a lot of staff from it.