microsoft / GLIP

Grounded Language-Image Pre-training
MIT License
2.24k stars 195 forks source link

Minimal hardware dependency of finetune. #157

Open Ethantequila opened 10 months ago

Ethantequila commented 10 months ago

I want to finetune this model based my own dataset, I am wondering what is the minimal GPU hardwares for finetune? Thanks!

weinman commented 10 months ago

For whatever it's worth, here was my experience: with a data set median image size of 1024x1024, I needed a GPU with 48GB to use a batch size of 3; a GPU with 24GB could handle a batch size of 1.