This PR adds support for checkpointing in order to fit ViTs into 32GB VRAM at full resolution. In order for checkpointing to work with DDP, we need specific checkpointing settings and cannot use the checkpointing implementation build into the vanilla ViT implementation. For this reason, we currently override the forward method of the ViT backbone, which is hacky, but also the least hacky solution I've found without duplicating lots of unrelated code. I have added the pre-trained ViT weights to the GitHub release, so you have to run download_models.sh before training. So far I have only tested configs/cityscapes/cityscapes_baseline/Base-RCNN-VitDetB-Cityscapes.yaml.
This PR adds support for checkpointing in order to fit ViTs into 32GB VRAM at full resolution. In order for checkpointing to work with DDP, we need specific checkpointing settings and cannot use the checkpointing implementation build into the vanilla ViT implementation. For this reason, we currently override the forward method of the ViT backbone, which is hacky, but also the least hacky solution I've found without duplicating lots of unrelated code. I have added the pre-trained ViT weights to the GitHub release, so you have to run
download_models.sh
before training. So far I have only testedconfigs/cityscapes/cityscapes_baseline/Base-RCNN-VitDetB-Cityscapes.yaml
.