rayleizhu / BiFormer

[CVPR 2023] Official code release of our paper "BiFormer: Vision Transformer with Bi-Level Routing Attention"
https://arxiv.org/abs/2303.08810
MIT License
461 stars 36 forks source link

Slow Training Speed #50

Open adrianjoshua-strutt opened 1 month ago

adrianjoshua-strutt commented 1 month ago

I'm experiencing extremely slow training times while training BiFormer Small on a custom dataset with an image size of 224x244. The training takes up to a minute per batch with a batch size of 32 on an RTX 3070 GPU. I'm unsure if this is an expected training speed or if there might be an issue with my training setup implementation.

I have verified that the GPU is being utilized during training. Other models have not shown similar slowdowns on the same hardware and dataset.

Any guidance or suggestions to improve the training speed would be greatly appreciated!

Thank you!