Closed iampartho closed 4 years ago
(1) pre-trained model
https://github.com/shinya7y/UniverseNet/releases/download/20.06/res2net50_v1b_26w_4s-3cf99910_mmdetv2.pth
is just an ImageNet pre-trained model for the Res2Net backbone.
I recommend to use a COCO pre-trained model by adding
load_from = 'https://github.com/shinya7y/UniverseNet/releases/download/20.07/universenet50_gfl_fp16_4x4_mstrain_480_960_2x_coco_20200729_epoch_24-c9308e66.pth'
to your config.
(2) RuntimeError and num_classes
RuntimeError like RuntimeError: The size of tensor a (4) must match the size of tensor b (36) at non-singleton dimension 1
occurs when the training fails (e.g., loss divergence).
If you don't use COCO pre-trained models, the error is probably not related to num_classes
, but due to randomness.
If you already use COCO pre-trained models, the error might relate to num_classes
. This is because the output layer for the only one class is initialized by the weights of COCO 'person' class if num_classes
remains 80.
In both cases, tuning batch size and learning rate will be more important to avoid the error.
(3) batch size
The batch size per GPU is defined in data = dict(samples_per_gpu=4)
.
The total batch size (16) for COCO follows papers.
Larger samples_per_gpu
will be preferable on colab (1 GPU), if possible.
Thank you very much for your reply. It clear up a lot of things for me.
About the batch_size , yeah I saw the samples_per_gpu=4 in the config file but if the batch size is 4 (sample_per_gpu=4 and single GPU) then per epoch should contain 2359 mini-batches for my training data. But when training is happening I see a total of 2301 mini-batches. I am not sure why this is happening , I double checked my annotation files and all the necessary files but can't seem to find any issue for it. Will really appreciate if you could shed some insight what might be the problem here. Thanks a lot.
get_subset_by_classes()
and _filter_imgs()
in coco.py may ignore background images (no bbox of target classes) and too small images.
Thank you so much for your replies. I was wondering if it is possible if you could provide me any official documentation or arxiv pre-print of Universenet or any type of blog-post if that is available other than this repo.
Unfortunately, I haven't written a technical report for now. On this topic, please check and use another issue. https://github.com/shinya7y/UniverseNet/issues/2
The RuntimeError of size mismatch is fixed in the latest master. Changing the class number is possible. So I close this issue.
The config file I am using is this
The pretrained I am using is this (The default pretrained)
So the problem I was having is I modified my dataset and distribute as the coco format and then change the coco.py accordingly. I changed the model's bbox_head's num_classes parameter to 1 (default was 80) as my dataset has only one class and start the training. The training had started but at the middle of epoch 1 it stopped giving me a tensor mismatch type runtimeError. Then I changed the model's bbox_head's num_classes back to default (80) and again started the training and this time I got no error while completing epoch 1. So, I was thinking the error was created because of using pretrained weight and my question is that is there any way I can bypass that meaning use pretrained weight as well as change the num_classes from 80 to 1.
PS: I have one short query , how is the batch size is selected could you please point me the direction in the config file where it is defined? (I am using google colab)