RuntimeWarning: invalid value encountered in log while training on own annotated dataset

martinkersner commented 9 years ago

Hi!

I am trying to train on my own dataset which consists of 3 classes. I have already been able to train on VOC2007 dataset with less classes so I am quite sure that the problem isn't caused by different number of classes.

I am able to successfully finish training and evaluate it, however for one of these 3 classes I get 0 mAP. With digging further I found out that while training sometimes there appears numpy RuntimeWarning: invalid value encountered in log. This warning is due to negative value in log function.

In _lib/fast_rcnn/bboxtransform.py on line 16 there are two vectors _gtrois[:, 2] and _gtrois[:, 0] which are deducted and then later on log function is applied on their difference. In some cases their difference is suprisingly negative. The pair of numbers is usually like (12.809, 111.236), (161.667, 291.667), (636.667, 788.333) but in these problem cases the first number is much larger (98302.5, 591). The _gtrois array is passed from _lib/rpn/anchor_targetlayer.py inside forward() method.

At first I thought that problem could be with data, so I checked it and deleted some images which were not RGB (they were part of the unsuccessfully trained class). I have also modified some of xmax and ymax in order to allow only maximum value within range [0, (width-1)] and [0, (height-1)], respectively. Nonetheless, any of these changes helped and I still receive RuntimeWarning: invalid value encountered in log at some points of training.

Any idea what wrong could be with data? Or how could I further track these large values? I know that that mentioned forward() method is activated from _lib/fastrcnn/train.py by self.solver.step(1) command, but I still haven't found a place where bottom parameter containing data is passed.

Thank you!

Martin

martinkersner commented 9 years ago

As I anticipated it turned out that the problem was with data. For annotating images I used this tool https://github.com/tzutalin/labelImg which allows to assign 0 value to xmin and xmax tags. These points determining bounding boxes are then deducted by number 1 somewhere in py-faster-rcnn code. It leads to underflowing. Number 0 becomes 65535 and when it is scaled by factor 1.5, the result is 98302.5 (the same number as I wrote in a post above).

The only thing which still isn't clear to me is that these 0-valued coordinates weren't only in the class which received 0 mAP.

I hope this could help somebody who is tackling with the same problem.

geeheim commented 8 years ago

and don't forget to delete the annotations_cache...

rbgirshick / py-faster-rcnn

RuntimeWarning: invalid value encountered in log while training on own annotated dataset #19