mapbox / robosat

Semantic segmentation on aerial and satellite imagery. Extracts features such as: buildings, parking lots, roads, water, clouds
MIT License
2k stars 382 forks source link

Allow arbitrary number of input channels in ResNet encoder (not only RGB) #56

Open daniel-j-h opened 6 years ago

daniel-j-h commented 6 years ago

With https://github.com/mapbox/robosat/pull/46 we are changing our model architecture from training the encoder and decoder from scratch to using a pre-trained ResNet for the encoder. The pre-trained ResNet uses three channels (RGB) for the input layer through.

We need to be able to add arbitrary channels, say, RGB + water mask + elevation + lidar. To to this we need to construct a wrapper module architecture extending the ResNet architecture, copying weights over, and initializing the new channels with zero. In addition the channel-wise mean and std dev needs to be adapted.

Tasks

mikoontz commented 5 years ago

Is there any concern with initializing channels with 0 instead of a value that implies 'missing'? Or is this just me misunderstanding the process?

Looking forward to seeing this develop, regardless!

daniel-j-h commented 5 years ago

If you check out Kaggle competitions with multi-spectral data or in general more channels most of them either initialize additional channels with zero, randomly initialize, or copy the rgb channels over. I've seen zero initialization working best for most winning solutions.