raoyongming / DenseCLIP

[CVPR 2022] DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting
505 stars 38 forks source link

ADE20K batchsize #5

Closed RainHxj closed 2 years ago

RainHxj commented 2 years ago

For ADE20K dataset, the batchsize is 32 (4*8gpus) ?

raoyongming commented 2 years ago

@RainHxj Yes, we use 8 gpus in our experiments. The global batch size is 32.

RainHxj commented 2 years ago

Thanks for your reply. The text encoder is frozen in the training phase, which is implemented by lr=0.0. Compared with no grad, does this method affect the training speed?

raoyongming commented 2 years ago

Although the text encoder is fixed during training, we still need to compute its gradients to update the learnable context (the input of the text encoder). Therefore, the torch.no_grad() mode cannot be used to accelerate the training.

RainHxj commented 2 years ago

Thanks. image Does the red box denote using a fixed prompt, such as "a photo of"?

raoyongming commented 2 years ago

We use learnable context in this case (Eq. 3 in our paper). Using the hard-craft prompt (a photo of class name) will lead to a slightly worse result.

RainHxj commented 2 years ago

Thanks for your reply.