Open ZBC043 opened 4 months ago
A follow up question, you mentioned in readme that "The minimum GPU memory is 24GB". I do have a 24 GB GPU but when I am training on meta-dataset with command: python main.py --output outputs/dino_base_IN1K_1 --dataset meta_dataset --data-path ./data/meta_dataset_h5 --num_workers 2 --base_sources ilsvrc_2012 --epochs 100 --lr 5e-4 --arch dino_base_patch16 --dist-eval --device cuda:0 --fp16
It has been an hour but still the log,txt has not log a single epoch after this line:
{"aircraft": 57.43517497380574, "cu_birds": 83.23636697133382, "dtd": 89.95396995544434, "fungi": 57.585270929336545, "ilsvrc_2012": 72.82337363560994, "omniglot": 75.9695914586385, "quickdraw": 59.56564900080363, "vgg_flower": 95.51134332021077, "n_ways": 15.921875, "n_imgs": 438.29168701171875, "acc1": 74.01008605957031, "acc5": 93.75128936767578, "loss": 1.364902377128601, "acc_std": 10.260849952697754, "epoch": -1}
Do you know what could be the problem?
Regarding the first question, --resume
should be used:
https://github.com/hushell/pmf_cvpr22/blob/622d656afba5aeb39b70eaa3cceda74ad0b09faf/main.py#L115
Regarding the second question, the log you showed is the evaluation on val-set before training, if it gets stuck there, I would suggest to set a breakpoint at https://github.com/hushell/pmf_cvpr22/blob/622d656afba5aeb39b70eaa3cceda74ad0b09faf/engine.py#L50 to see how much time it takes to do a forward pass.
BTW, 24G is not enough for ViT in PMF unless you go for something like mobileViT, because it has to host 2 copies in the computation graph.
I see! Thank you for your help!
Hey, I was wondering if you still get hold of the checkpoint results on Meta-Dataset for dino_base and dino_resnet50? Because in the repo, I say you shared the checkpoint for dino_small. Many thanks in advance.
Hi, many thanks for your amazing work. I have a question regarding on what command in args parser should I use if I want to use the checkpoint trained on meta-dataset to train on cifar-fs and mini-imagenet. Should I use the --pretrained-checkpoint-path or --resume? Many thanks in advance.