justinkay / aldi

Official implementation of "Align and Distill: Unifying and Improving Domain Adaptive Object Detection"
https://aldi-daod.github.io/
47 stars 7 forks source link

ALDI refactor and ViTDet backbones #2

Closed timmh closed 12 months ago

timmh commented 1 year ago

This PR adds support for checkpointing in order to fit ViTs into 32GB VRAM at full resolution. In order for checkpointing to work with DDP, we need specific checkpointing settings and cannot use the checkpointing implementation build into the vanilla ViT implementation. For this reason, we currently override the forward method of the ViT backbone, which is hacky, but also the least hacky solution I've found without duplicating lots of unrelated code. I have added the pre-trained ViT weights to the GitHub release, so you have to run download_models.sh before training. So far I have only tested configs/cityscapes/cityscapes_baseline/Base-RCNN-VitDetB-Cityscapes.yaml.