Open kduy opened 7 years ago
Same problem. Any solution? Any help would be appreciated. @acgtyrant @kduy
@longcw
@abhiML I am refactoring the program, and it's still ongoing. So I have not get it worked as so far.
Do you load pretrained npy for vgg16?
Yeah first it gives a runtime warning:
RuntimeWarning: invalid value encountered in greater_equal
keep = np.where((ws >= min_size) & (hs >= min_size))[0]
Do you use Python 2? I do not encounter this error.
Yeah I am using 2.7. You are running it on your own dataset?
I ran it a few steps in PASCAL VOC 2007 trainval dataset, no problem. If you want to run it on the new dataset, you must adjust the source code by yourself.
Yeah but what all do I have to adjust? I just changed the classes in pascal_voc.py and prepared the dataset according to the Pascal VOC 2007 set.
I have not train the model in the new dataset, wait.
Okay
https://github.com/rbgirshick/py-faster-rcnn/issues/65 Could you take a look at this issue ?
@acgtyrant going by https://github.com/longcw/faster_rcnn_pytorch/blob/master/faster_rcnn/network.py#L109 as far as I understood if the totalnorm becomes very large, then the norm gets really small and underflow occurs? Is that correct?
No, it is used to prevent overflow occurs.
But I am using that function. Still I am getting the error.
I had the issue described and I now seem to be able to train without this error when using SDG or if you use ADAM loss will equal NAN, I would suggest you check the values in the gt_boxes of any image cause this error. For me when reading the xml files it was assigning some negative values which where being transformed to huge numbers. Also the PASCALVOC uses -1 on the XMIN and YMIN so if your bounding boxes are set at 0 they will be set to -1 and this caused issues as well. I fixed this in my _load_AFLW_annotation function by making sure the absolute value was taken and if a value was equal to 0 don't do a subtraction. This may help.
Yeah I was making a similar mistake. In the dataset some of the annotations were wrong (xmin>xmax). Once I corrected those and set the negative values to 0, it worked fine.
i have checked my annotations and it is right for experiment, so do anyone know any other bug that would lead to this problem?
@liyuanyaun I have encountered this problem too. After discard the shuffle operation in RoIDataLayer(),and locate which image the error occurs. I found that one of the bounding boxes has xmin=0, and voc_pascal.py which I imitated has -1 operation, so gt_boxes got a negative value. Here is an issue relative to this: https://github.com/rbgirshick/py-faster-rcnn/issues/9 (you can search 'based') After remove -1 and delete ground truth .pkl file(needed if you created before), the error is gone.
I am trying to train the model with my own dataset. Sometimes , I got this error
I traced the bug and figure out that it returns zeros array after conv3 in
faster_rcnn/vgg16.py
, hence return zero-array feature after forwarding through vgg16 Do you have any clue why ? Thank yah.