shinya7y / UniverseNet

USB: Universal-Scale Object Detection Benchmark (BMVC 2022)
Apache License 2.0
422 stars 54 forks source link

For universenet is it possible to use pretrained weights while changing the class number from 80 to 1? #4

Closed iampartho closed 4 years ago

iampartho commented 4 years ago

The config file I am using is this

The pretrained I am using is this (The default pretrained)

So the problem I was having is I modified my dataset and distribute as the coco format and then change the coco.py accordingly. I changed the model's bbox_head's num_classes parameter to 1 (default was 80) as my dataset has only one class and start the training. The training had started but at the middle of epoch 1 it stopped giving me a tensor mismatch type runtimeError. Then I changed the model's bbox_head's num_classes back to default (80) and again started the training and this time I got no error while completing epoch 1. So, I was thinking the error was created because of using pretrained weight and my question is that is there any way I can bypass that meaning use pretrained weight as well as change the num_classes from 80 to 1.

PS: I have one short query , how is the batch size is selected could you please point me the direction in the config file where it is defined? (I am using google colab)

shinya7y commented 4 years ago

(1) pre-trained model https://github.com/shinya7y/UniverseNet/releases/download/20.06/res2net50_v1b_26w_4s-3cf99910_mmdetv2.pth is just an ImageNet pre-trained model for the Res2Net backbone.

I recommend to use a COCO pre-trained model by adding

load_from = 'https://github.com/shinya7y/UniverseNet/releases/download/20.07/universenet50_gfl_fp16_4x4_mstrain_480_960_2x_coco_20200729_epoch_24-c9308e66.pth'

to your config.

(2) RuntimeError and num_classes RuntimeError like RuntimeError: The size of tensor a (4) must match the size of tensor b (36) at non-singleton dimension 1 occurs when the training fails (e.g., loss divergence). If you don't use COCO pre-trained models, the error is probably not related to num_classes, but due to randomness. If you already use COCO pre-trained models, the error might relate to num_classes. This is because the output layer for the only one class is initialized by the weights of COCO 'person' class if num_classes remains 80. In both cases, tuning batch size and learning rate will be more important to avoid the error.

(3) batch size The batch size per GPU is defined in data = dict(samples_per_gpu=4). The total batch size (16) for COCO follows papers. Larger samples_per_gpu will be preferable on colab (1 GPU), if possible.

iampartho commented 4 years ago

Thank you very much for your reply. It clear up a lot of things for me.

About the batch_size , yeah I saw the samples_per_gpu=4 in the config file but if the batch size is 4 (sample_per_gpu=4 and single GPU) then per epoch should contain 2359 mini-batches for my training data. But when training is happening I see a total of 2301 mini-batches. I am not sure why this is happening , I double checked my annotation files and all the necessary files but can't seem to find any issue for it. Will really appreciate if you could shed some insight what might be the problem here. Thanks a lot.

shinya7y commented 4 years ago

get_subset_by_classes() and _filter_imgs() in coco.py may ignore background images (no bbox of target classes) and too small images.

iampartho commented 4 years ago

Thank you so much for your replies. I was wondering if it is possible if you could provide me any official documentation or arxiv pre-print of Universenet or any type of blog-post if that is available other than this repo.

shinya7y commented 4 years ago

Unfortunately, I haven't written a technical report for now. On this topic, please check and use another issue. https://github.com/shinya7y/UniverseNet/issues/2

shinya7y commented 4 years ago

The RuntimeError of size mismatch is fixed in the latest master. Changing the class number is possible. So I close this issue.