Initial Weights Initialization

wasidennis / AdaptSegNet

Learning to Adapt Structured Output Space for Semantic Segmentation, CVPR 2018 (spotlight)

845 stars 205 forks source link

Initial Weights Initialization #92

Open Nadavc220 opened 3 years ago

Nadavc220 commented 3 years ago

When training the model you are initialzing the model weights with the weights found here: 'http://vllab.ucmerced.edu/ytsai/CVPR18/DeepLab_resnet_pretrained_init-f81d91e8.pth'

The paper states that the initial baseline is a Deeplab pre-trained on ImageNet dataset. Are these weights the ImageNet pre trained or are you using a GTA5 pretrained network to initialize the model?

Thanks.

wasidennis commented 3 years ago

@Nadavc220 sorry for the confusion. In this repo, the pre-trained weight for the VGG backbone is on ImageNet, while the one for the ResNet is on ImageNet + COCO (obtained from DeepLab). In practice, we found that using ImageNet pre-trained weight converges slower but will eventually achieve a similar result as ImageNet + COCO. Similarly, if using a GTA5 pre-trained weight, it should also achieve a similar result with faster convergency.

Nadavc220 commented 3 years ago

Thanks for the quick response. In general, don't you thing there is a difference initializing the net with ImageNet + COCO than initializing with a GTA5 trained net other than convergence speed? Did you test this theory or is it just an idea?

Thanks

wasidennis commented 3 years ago

@Nadavc220 This is a good point! Internally, the learning behavior would be quite different as we need to consider their domain gaps. For example, initializing from GTA5 could provide a more stable training procedure, as GTA5 is already a driving-scene dataset. However, since GTA5 also has a large domain gap to Cityscapes, pre-training on GTA5 for too many iterations would not be also a good practice (fitting too much in the GTA5 data distribution). This is something we already empirically tried, but of course it is still an open research problem to explore. To make it simpler, we just use the ImageNet (+COCO) as the pre-trained weight.