Prakhar-97 commented 4 years ago

I was trying to train a CascadeMaskRCNN Hrnet model on a custom dataset in my local system with COCO style annotations.

This is the message that I get while training my dataset. On further looking I found that when running the single_gpu_test for validation data, the results are all empty, I am not able to identify the reason for the same. I am new to this library and any help would be appreciated.

hellock commented 4 years ago

There can be many reasons for that since you train the model on a custom dataset, and we cannot debug for you. E.g, the model has not converged, it suffers from overfitting, the threshold is high, the train/test set has much differences.

Rajat-Mehta commented 4 years ago

I am getting the same error while training on custom dataset in coco format

Loading and preparing results...
2020-06-14 23:36:16,926 - mmdet - ERROR - The testing results of the whole dataset is empty.

what might be the reason and how to solve this?

Prakhar-97 commented 4 years ago

As you are using a custom dataset in the coco format make sure that you mention about the classes in the config files. This might be one of the reasons ...

dataset settings

dataset_type = 'CocoDataset' classes = ('a', 'b', 'c', 'd', 'e') ... data = dict( samples_per_gpu=2, workers_per_gpu=2, train=dict( type=dataset_type, classes=classes, ann_file='path/to/your/train/data', ...), val=dict( type=dataset_type, classes=classes, ann_file='path/to/your/val/data', ...), test=dict( type=dataset_type, classes=classes, ann_file='path/to/your/test/data', ...)) ...

muhammadabdullah34907 commented 4 years ago

@Prakhar-97 very well said. thank you

anonymoussss commented 4 years ago

There can be many reasons for that since you train the model on a custom dataset, and we cannot debug for you. E.g, the model has not converged, it suffers from overfitting, the threshold is high, the train/test set has much differences.

Hello, I am using the coco dataset to train a detection model. I want to know why the error "The testing results of the whole dataset is empty" is reported when the model does not converge?

tetsu-kikuchi commented 3 years ago

In my case, I got this message in the evaluation at the 5th epoch, but not after the 10th epoch (val is done at every 5 epochs). I used my custom dataset with 45 train data and 5 val data. The train/val loss was not nan at the 5th epoch. I used faster_rcnn_r50_fpn_2x_coco.py. Maybe it is better if the message is modified so that its meaning is clearer.

LLsmile commented 2 years ago

I trained YOLOX with my data and labels and it works well. But I get the same error in mmdet. So how can I check whether my json file meets the need of mmdet? My dataset is in coco format, but I'm not sure whethre it is okay for mmdet.

kurbobo commented 2 years ago

What exactly this warning told us about? I've got this warning on test set after one epoch and I decided that the model simply can't read any element in test dataset. After reading upper comments I think, that it tell us that model makes extremely bad predictions on the dataset. So, what exactly this warning means? That model didn't predicted anything on my dataset? Or what? I also think that this warning is too vague:(

sarmientoj24 commented 2 years ago

Also getting the same thing

KingofLong commented 2 years ago

I solved this problem while training on retinanet with my custom dataset by cutting half the lr.

happy20200 commented 2 years ago

我用我的数据和标签训练了 YOLOX，它运行良好。但是我在 mmdet 中遇到了同样的错误。那么如何检查我的 json 文件是否满足 mmdet 的需要呢？我的数据集是 coco 格式的，但我不确定 mmdet 是否可以。您好，请问这个问题你解决了嘛，我也是同样的情况，自己的数据集不确定做的对不对

krazygaurav commented 2 years ago

I have seen the same issue, if you are using some grad_clip then it is possible that your gradients have overshot the value. In this case, check your training logs, most likely you will see nan in losses. In that case, you have to change your hyper-parameters.

zhaoyuezhy commented 2 years ago

In my case, I got this message in the evaluation at the 5th epoch, but not after the 36th epoch...

David-Biggs commented 2 years ago

In my case, I got this message in the evaluation at the 5th epoch, but not after the 10th epoch (val is done at every 5 epochs). I used my custom dataset with 45 train data and 5 val data. The train/val loss was not nan at the 5th epoch. I used faster_rcnn_r50_fpn_2x_coco.py. Maybe it is better if the message is modified so that its meaning is clearer.

I got the same! Got this message after epoch 5 but then not after epoch 10...

zhangwanjun4843 commented 2 years ago

reducing the learning rate solved

gicu8ab2 commented 2 years ago

Reducing learning rate worked for me also!

yoletPig commented 1 year ago

As you are using a custom dataset in the coco format make sure that you mention about the classes in the config files. This might be one of the reasons ...

dataset settings

dataset_type = 'CocoDataset' classes = ('a', 'b', 'c', 'd', 'e') ... data = dict( samples_per_gpu=2, workers_per_gpu=2, train=dict( type=dataset_type, classes=classes, ann_file='path/to/your/train/data', ...), val=dict( type=dataset_type, classes=classes, ann_file='path/to/your/val/data', ...), test=dict( type=dataset_type, classes=classes, ann_file='path/to/your/test/data', ...)) ...

thank you so much. Reducing learning rate can't solve my problem,but this is work.

koshinryuu commented 1 year ago

but it getsTypeError: __init__() got an unexpected keyword argument 'classes', when I add classes=classes,

hwaseem04 commented 1 year ago

but it getsTypeError: __init__() got an unexpected keyword argument 'classes', when I add classes=classes,

I guess in 3.x the classes argument is removed and is included in metadata attribute. Try reducing the learning rate by changing it in the config, it worked for me.

gitleej commented 1 year ago

It may be caused by METAINFO under coco dataset. Just modify METAINFO under mmseg/dataset/coco.py.

klinsc commented 12 months ago

There can be many reasons for that since you train the model on a custom dataset, and we cannot debug for you. E.g, the model has not converged, it suffers from overfitting, the threshold is high, the train/test set has much differences.

reducing the learning rate solved

So I found the solution, whoever stump upon the "The testing results of the whole dataset is empty" line. Just add this line into your configuation file. (By the way, I am not sure what the default learning rate is, but my co-pilot told that it is 0.02) optimizer = dict(lr=0.02 / 10) # reduce learning rate by 10x.

linnnnnnnda commented 9 months ago

Lowering the learning rate can resolve this error.

veizei-K commented 6 months ago

There can be many reasons for that since you train the model on a custom dataset, and we cannot debug for you. E.g, the model has not converged, it suffers from overfitting, the threshold is high, the train/test set has much differences.

reducing the learning rate solved

So I found the solution, whoever stump upon the "The testing results of the whole dataset is empty" line. Just add this line into your configuation file. (By the way, I am not sure what the default learning rate is, but my co-pilot told that it is 0.02) optimizer = dict(lr=0.02 / 10) # reduce learning rate by 10x.

Thank you, brother. I lowered it by half at the beginning, and it still didn't work, as you can

open-mmlab / mmdetection

mmdet - ERROR - The testing results of the whole dataset is empty #2942

dataset settings

dataset settings