mapbox / robosat

Semantic segmentation on aerial and satellite imagery. Extracts features such as: buildings, parking lots, roads, water, clouds
MIT License
2.02k stars 382 forks source link

Adds squeeze and excitation (scSE) modules, resolves #157 #161

Open daniel-j-h opened 5 years ago

daniel-j-h commented 5 years ago

For https://github.com/mapbox/robosat/issues/157.

Adds scSE modules :boom: :rocket:

https://arxiv.org/abs/1709.01507

Squeeze-and-Excitation Networks

https://arxiv.org/abs/1803.02579

Concurrent Spatial and Channel 'Squeeze & Excitation' in Fully Convolutional Networks

scse

from https://arxiv.org/abs/1803.02579

Tasks

@ocourtin maybe this is interesting to you :)

daniel-j-h commented 5 years ago

Just added the scSE modules to our encoders and decoders following the paper recommendation.

Let's see if this thing goes :rocket:

mikanga10 commented 5 years ago

i would like daudi karanja indigenous land map protected area in kenya

daniel-j-h commented 5 years ago

What I'm seeing in benchmarks so far is consistent better performance (+ 4-6 pct points) for an incredible small increased computational cost. I will run some more benchmarks over the next days but if nothing wild happens Iit'd be best to get this in. Fascinating results, love it!

@ocourtin maybe you want to give it a try, too, if you have the time and dataset for this to benchmark it.

jqtrde commented 5 years ago

Also, what a great name :ok_hand:

ocourtin commented 5 years ago

@daniel-j-h Thanks for this !

I gave a quick try (with robosat.pink), and for now, not yet able to see significant improvement (from metrics), with scSE stuff.

Will try harder...

daniel-j-h commented 5 years ago

@ocourtin did you find the time to try again this branch? I'm seeing improvements from the scSE blocks at almost no cost when training on my large datasets. Would be great if we can confirm this otherwise I'm hesitant to just merge it in.

daniel-j-h commented 4 years ago

By now we have https://arxiv.org/abs/1904.11492 which not only compares various attention mechanisms but also comes up with a framework for visual attention and proposal a new global context block in this visual attention framework.

I've implemented

for my 3d video models in https://github.com/moabitcoin/ig65m-pytorch/blob/706c9e737e42d98086b3af24548fb2bb6a7dc409/ig65m/attention.py#L9-L103

for the 2d segmentation case here we can adapt the 3d code and then e.g. use a couple of global context blocks on top of the last (high level) resnet feature blocks.


attention from https://arxiv.org/abs/1904.11492