Closed XinyueZ closed 1 year ago
Have a look at section 4.2 of the paper. It states the following:
The models are named according to the following convention: ZoeD-{RDPT}-{MFT}, where ZoeD is the abbreviation for ZoeDepth, RDPT denotes the datasets used for relative depth pre-training (X denotes no pre-training) and MFT denotes the datasets used for metric depth finetuning. We train and evaluate the following models: ZoeD-X-N, ZoeD-X-K, ZoeD-M12-N, ZoeD-M12-K and ZoeD-M12-NK. All models use the BEiT384-L backbone from timm [44] that was pre-trained on ImageNet. The models ZoeD-X-N and ZoeD-X-K are directly fine-tuned for metric depth on NYU Depth v2 and KITTI respectively without any pre-training for relative depth estimation. ZoeD-M12-N and ZoeD-M12-K additionally include pre-training for relative depth estimation on the M12 dataset mix before the fine-tuning stage for metric depth. ZoeD-M12-NK is also pre-trained on M12, but has two separate heads fine-tuned on both NYU Depth v2 and KITTI. ZoeD-M12-NK† is a variant of this model with a single head but otherwise the same pre-training and fine-tuning procedure. In the supplement, we provide further results for models trained on additional dataset combinations in pre-training and fine-tuning.
In a nutshell:
As all models share the same weights I would say that ZoeD-M12-NK† is meant with "NK" in the code (all models have the same weights).
Hey guys,
correct me if I was wrong. I cannot find briefing in the paper regarding this.
bg