Hello, I am currently executing your code using a single GPU (without distributed mode). However, the results are significantly different from what was presented in your paper. Is it expected for the results to vary? For instance, the result on a single GPU for the DTD dataset is 50.1%, whereas in your paper, it is reported as 54.1% using Vit-B/16
Hello, I am currently executing your code using a single GPU (without distributed mode). However, the results are significantly different from what was presented in your paper. Is it expected for the results to vary? For instance, the result on a single GPU for the DTD dataset is 50.1%, whereas in your paper, it is reported as 54.1% using Vit-B/16