sabarim / STEm-Seg

This repository contains the official implementation of the paper "STEm-Seg: Spatio-temporal Embeddings for Instance Segmentation in Videos"
153 stars 23 forks source link

The batchsize is 1 so how to use distributed dataparallel? #5

Closed EDENpraseHAZARD closed 3 years ago

EDENpraseHAZARD commented 3 years ago

Hi, thanks for releasing the codes. I'm trying to use distributed dataparallel(DDP) to train the model in Youtube-VIS, but I find the batch_size is 1 in single gpu. It's no use with DDP when the batchsize is 1. The batchsize is actually the max_per_gpu which is 1 shown in the picture below. 微信图片_20201111211207 So I want to know how many nodes and batchsize you used during the experiments

Ali2500 commented 3 years ago

HI, since we didn't have the hardware to train on several GPUs, we simulate larger batch sizes by accumulating the gradients over multiple training iterations (note that this is not exactly the same as training with a larger batch size, but that's a separate issue). In the above code, the gradients will be accumulated optimizer_step_interval times before calling optrmizer.step(). Also keep in mind that under DDP, this code will be executed in parallel in multiple processes, so even though the batch size given to the data loader is 1, the overall effective batch size is 1 self.num_gpus optimizer_step_interval.