Closed lxtGH closed 3 years ago
Also, I found each GPU hold 2 images during training for each dataset. How to balance the dataset sample for each node since each dataset has variant size?
Hello, I managed to train the model using _~/mseg-semantic/mseg_semantic/tool/train.py_ .
I have provided a .yaml file as the --config argument (e.g. mseg_3m.yaml). You can modify a bunch of stuff there. By default, as far as I understood, each dataset trains on a separate GPU (check the _dataset_gpumapping entry in the .yaml file. Since the model is distributively trained, the batch size will be split equally among each gpu. I believe that means that the smaller datasets will get to train for more epochs, and bigger ones for less epochs. Maybe this is a solution to combat catastrophic forgetting .
You can also edit the HRNet architecture in _ mseg-semantic/mseg_semantic/model/seg_hrnet.yaml _.
Hi @lxtGH, have you read through the training section of the README? https://github.com/mseg-dataset/mseg-semantic#training-instructions
That will point you to the TRAINING.md page: https://github.com/mseg-dataset/mseg-semantic/blob/master/training.md
You can directly use python ~/mseg-semantic/mseg_semantic/tool/train.py
with desired arguments, as @caiusdebucean mentioned. Debucean is correct -- we use DDP, with each dataset on a separate GPU, and gradients are reduced across workers.
Please note that we provide code to train dozens of different models (and we have provided weights for dozens of different trained models), so users may be looking for different sorts of training configs.
Which taxonomy, dataset, and resolution you are looking to train for? You can find more details in our paper.
Our HRNet model in Table 2 is trained using the universal taxonomy, for 1 million crops, at 1080p resolution.
Thanks for you reply!
I wonder if all the datasets are merged into one single large dataset. (Small datasets repeat multiple times while large datasets repeat less times) Will the results be different ?
@johnwlambert I mean if I concat these dataset by padding(maybe multiply a ratio number) small datasets into one large dataset. Then I can train it with 4 GPUs.
It's still a bit of an open research question about the best way to mix datasets together at training time. We wanted to prevent the large datasets from dominating the domains represented by smaller datasets, and we found our solution already worked well. But concatenating into one big dataset and randomly sampling IDs for minibatches would be something interesting to compare against.
If you concatenate into one large dataset by using multiplicative ratios as if they were each on their own GPU, in expectation the minibatches should have the same ratios, so likely the results would be similar to ours. I cannot guarantee it 100%, but it's likely.
Hi! @johnwlambert I checkout the training branch. I can not find any instructions for training. I wonder how to train the HR model in Tab2 of your paper.
Also where is train-qvga-mix-copy.sh? where is train-qvga-mix-cd.sh ?
It is very confusing.