moonshinelabs-ai / moonshine-remote-sensing

Pretrained remote sensing models for the rest of us.
MIT License
129 stars 1 forks source link

MAE Pretraining #7

Open isaaccorley opened 1 year ago

isaaccorley commented 1 year ago

Your docs state that you pretrained the ResNet backbones using the Masked Autoencoding (MAE) SSL method. I was under the assumption that MAE was exclusive to Vision Transformers (ViT) due to the patchifying process. Is this assumption wrong? Could you share the link to the implementation you used for pretraining?

nharada1 commented 1 year ago

Yup, that's actually not a true limitation. For ResNet, I simply apply a mask to the target image in the places we want to reconstruct. I still apply the patches in the same sized blocks and masking ratios as the MAE paper.

The one downside of using a ResNet for this is that we do not realize the training efficiency gains a sparse model like ViT can -- we must process even the masked parts of the image. This is definitely a downside but at the size of the ResNets we're training it's not actually that huge a deal from a wall-clock or compute perspective.

isaaccorley commented 1 year ago

Okay, I figured it was just training a U-Net to reconstruct an image after some form of random cutout transformation but wanted to double check. I guess the next question is why not train using an SSL method which actually provides the benefits for resnets?

nharada1 commented 1 year ago

Truthfully, it's because I intended long term to offer more powerful ViT backbones and feature extractors, and was viewing these UNet variants as more proof of concept/initial releases. I didn't really wanna redo the training code.

That said, I'm open to suggestions. Is there an SSL method you find better suited to the UNet architectures? Also possibly one that's better suited for remote sensing as opposed to natural images?