Open JulienMaille opened 7 months ago
Hi @JulienMaille ,
I'll look into this as soon as I come back home, or maybe @isaaccorley can give it a look in the meanwhile. Btw you're right. By using timm as backbone for the encoders you can now select a specific norm layer, but when it comes to decoders we still have only BatchNorm.
We alredy had the idea to support different norm layers also for decoders, we just have to think about the best way to do that, because we obviously need to specify different parameters for different norm layers (as briefly outlined here).
Once we implement this functionality, integrating free AdamW should be easy.
Have you seen this optimizer? Anyone gave it a try? Seems a bit less straightforward to integrate to torchseg since most of our decoders will use BatchNorm. https://github.com/facebookresearch/schedule_free?tab=readme-ov-file