tensorflow / models

Models and examples built with TensorFlow
Other
76.95k stars 45.79k forks source link

Error: maximum box coordinate value is too large #1754

Closed ericj974 closed 7 years ago

ericj974 commented 7 years ago

System information

During training (rfcn architecture), the following error appears after some random steps:

InvalidArgumentError (see above for traceback): assertion failed: [maximum box coordinate value is larger than 1.01: ] [1.0111111] Which happens to be thrown by box_list_ops.to_normalized_coordinates after a failed assertion.

I have to restart from the latest checkpoint to continue the training, however the same type of error can be thrown again.

Log can be found here: https://gist.github.com/ericj974/f270855bf6368509c74c05e94b6cb7b8 Config file here: https://gist.github.com/ericj974/8af390e1841b4f9be463b70573dc17d1

schesho commented 7 years ago

Hi, have you find what caused this error? Have the same error here

update fixed: In my case i've mixed the groundtruth regions (height and width were exchanged)

ali01 commented 7 years ago

Could you please verify that your ground truth regions are set correctly?

ericj974 commented 7 years ago

Hi, something similar on my side, after looking twice at my ground truth regions, some of them had their box coordinates greater than either the image width or height.

FPerezHernandez92 commented 7 years ago

I have similar error: https://pastebin.com/3KrSaEAF but I check my bbox files and is all ok. Besides, I delete de files and I replace by other and the error continues. What I can do?

TKassis commented 7 years ago

I'm having the same error when I use any of the following: _faster_rcnn_inception_resnet_v2_atrous_coco rfcn_resnet101coco

But NOT when I use: _ssd_inception_v2_coco ssd_mobilenet_v1coco

My training images are a mixture of 300x300 and 450x450 pixels. I don't believe any of my bounding boxes are outside the image coordinates. Even if that's the case why would the last two models work but not the resnet models?

slsd123 commented 7 years ago

I'm having the same issue when I train either of the faster_rcnn models and the tfcn_resnet model but not with either of the ssd models.

TKassis commented 7 years ago

Let me know if you find a solution. I've been struggling with this for 3 days now and I have no clue why this is happening.

yinggo commented 6 years ago

Is there anyone solve this problem? I also can not use faster_rcnn_inception_resnet_v2_atrous_coco and rfcn_resnet101_coco The error is: 2017-10-25 21:23:15.296232: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: assertion failed: [maximum box coordinate value is larger than 1.01: ] [1.0140625] [[Node: ToAbsoluteCoordinates/Assert/AssertGuard/Assert = Assert[T=[DT_STRING, DT_FLOAT], summarize=3, _device="/job:localhost/replica:0/task:0/cpu:0"](ToAbsoluteCoordinates/Assert/AssertGuard/Assert/Switch/_167, ToAbsoluteCoordinates/Assert/AssertGuard/Assert/data_0, ToAbsoluteCoordinates/Assert/AssertGuard/Assert/Switch_1/_169)]] 2017-10-25 21:23:15.296248: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: assertion failed: [maximum box coordinate value is larger than 1.01: ] [1.0140625] [[Node: ToAbsoluteCoordinates/Assert/AssertGuard/Assert = Assert[T=[DT_STRING, DT_FLOAT], summarize=3, _device="/job:localhost/replica:0/task:0/cpu:0"](ToAbsoluteCoordinates/Assert/AssertGuard/Assert/Switch/_167, ToAbsoluteCoordinates/Assert/AssertGuard/Assert/data_0, ToAbsoluteCoordinates/Assert/AssertGuard/Assert/Switch_1/_169)]]

yinggo commented 6 years ago

Actually, I just ignore this error, change the ./core/box_list_ops.py max_assert = tf.Assert(tf.greater_equal(1.1, box_maximum), ['maximum box coordinate value is larger ' 'than 1.1: ', box_maximum]) 1.01 to 1.1, as my error max value is 1.014. By this way to make the code keep on. Seems the result is not affected.

madi commented 6 years ago

I have the same problem, it would be useful to return the name of the image that fails when it raises this error

CARASO commented 6 years ago

try check_range=False

kirk86 commented 6 years ago

@yinggo @CARASO thanks for the valuable info. But after doing the appropriate changes now I get the following error:

InvalidArgumentError (see above for traceback): assertion failed: []
Condition x == y did not hold element-wise:] [x (Loss/BoxClassifierLoss/assert_equal_2/x:0) = ] [0] 
[y (Loss/BoxClassifierLoss/assert_equal_2/y:0) = ] [16]
 [[Node: Loss/BoxClassifierLoss/assert_equal_2/Assert/Assert = Assert[T=[DT_STRING, DT_STRING, DT_STRING, DT_INT32, DT_STRING, DT_INT32], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Loss/BoxClassifierLoss/assert_equal_2/All/_2891, Loss/BoxClassifierLoss/assert_equal_1/Assert/Assert/data_0, Loss/BoxClassifierLoss/assert_equal_1/Assert/Assert/data_1, Loss/BoxClassifierLoss/assert_equal_2/Assert/Assert/data_2, Loss/BoxClassifierLoss/assert_equal_2/x/_2893, Loss/BoxClassifierLoss/assert_equal_2/Assert/Assert/data_4, Loss/BoxClassifierLoss/ones_1/packed/_99)]]

Any suggestions or hints?

CARASO commented 6 years ago

@kirk86 Yes, I also found that if ‘check_range = false’ there are still other errors. Now I can only use the dichotomy to troubleshoot data sets. Hope that other friends have better solutions.

kirk86 commented 6 years ago

@CARASO Hi, sorry for bothering I was just wondering did you also tried in addition what @yinggo suggested? Did you still got the same error?

CARASO commented 6 years ago

@kirk86 Yes, I tried his method, but it didn't help me. Then I used the suggestion for you. I think the direction of the two of us is the same. We are all ignoring this cross-border mistake. But then I found this direction wrong, because it will only make the mistake erupt later. At the moment, I compressed the 4000+ dataset to only 600+, and I was able to train normally. Afterwards, I tried to incrementally add data and try to find out the problem data set. There is no better plan at this time. I hope you or other friends can tell me when they have better suggestions.

asturm0 commented 6 years ago

Hey guys, I had the same error and found a solution that worked for me. As my training-data I used several classes of the image-net dataset and wrote a script that also downloaded the bounding boxes. Then I used some scripts of this repo to convert the xml files to a csv(xml_to_csv.py), and after that turned it into a record-file (generate_tfrecord.py). Apparently, there are some wrong data in the image-net annotations or another issue with that. However, I wrote a little script to loads every image I had in my csv-file and checked it against the height/width info from the xml file. In my case there were a couple of wrong annotated images and after removing them from my csv, the training was successful. In case it helps anyone I uploaded my image-checking scirpt here. You will probably have a different workflow, but I just wanted to highlight that you should better doublecheck your annotation data if you have downloaded it from somewhere.

Cheers

CARASO commented 6 years ago

@asturm0 Yes, I also used similar methods to filter the data set, but unfortunately, the filtered data set still has an exception. I had to manually delete more than 1,500 photos. At present, 12000+ steps were trained and no abnormalities were found. I share the script that filters datasets here, I hope to be useful to other partners, and someone can give a more perfect solution.

Cheers

tvkpz commented 6 years ago

This error also arises when the bounding boxes are two small compared to the size of the image. I deleted all boxes that are less than 1/16 th of the image size and the training works fine. Has anyone tried this before? Is there a specific proportion (instead of 1/16th) that we can take that is tied to how fasterRCNN is implemented?

CARASO commented 6 years ago

@tvkpz Are you saying area? Or is it length or width?

tvkpz commented 6 years ago

area

CARASO commented 6 years ago

if object_area <= (width * height) / (16 * 16): raise Exception('object too small Error')

After filtering the data set above, I have trained over 450,000 steps to see no exceptions.

tvkpz commented 6 years ago

Size of the area is dependent on some of the parameters (size of anchor etc) of fasterRCNN algorithm. Wondering if anyone knows the connection. I have not yet looked into it.

rmekdma commented 6 years ago

In my case few images gave different width and height on opencv and PIL. (w, h are exchanged for some reason) I added lines to create_tf_record.py

width, height = image.size
if int(data['size']['width']) != width or int(data['size']['height']) != height:
    continue
mape1082 commented 6 years ago

Thanks @asturm0 . Your script helped me detect that one of my images had a wrong annotation (a box fully outside of the image - used labelimg).

tamisalex commented 6 years ago

My issue was that for some of my images, the height listed in my pascal xml was actually the width dimension and vice versa. This maybe because I used an out of date vatic build. Hopefully this helps someone

Sibozhu commented 5 years ago

try check_range=False

Dude, life saver

metromark commented 5 years ago

try check_range=False

Dude, life saver

@Sibozhu Where did you change this?

metromark commented 5 years ago

I have the same problem, it would be useful to return the name of the image that fails when it raises this error

@madi how do you print the image name? I tried scouring for the function that prints the image name but couldn't find it in the repo. Any help would be appreciated!

schliffen commented 5 years ago

I have the same erro: InvalidArgumentError (see above for traceback): assertion failed: [maximum box coordinate value is larger than 1.100000: ] [1.1015625] with "mobilenet_v2_1.0_128". It accures after saving checkpoints for first time. It is not because ground truth are wrong, it seems algorithm does not learn well first.

anujeet98 commented 5 years ago

@CARASO thanks ,your script helped me to identify the mistake

anujeet98 commented 5 years ago

Use this size checker . Annotation box should not be too small https://github.com/EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10/blob/master/sizeChecker.py

LyonOconner commented 5 years ago

try check_range=False

which file ?

lightqsvip commented 5 years ago

/software/models/research/object_detection/core/box_list_ops.py

codexponent commented 5 years ago

It seems that the training data has some bad annotations. Rather than changing on the core file as suggested by @yinggo , I think it would be better if you could change the bbox coordnates, if possible, while preparing the dataset. Example: if larger by 1.04 and 1.08, then just reduce the xmin, xmax, ymin and ymax by 1.5 or 2 and then normalize by width and height.

TakumiWzy commented 5 years ago

try check_range=False

Oh , thanks! It seems worked! :) image

natalhabenatti commented 4 years ago

try check_range=False

Oh , thanks! It seems worked! :) image

Where did you put this? I didn't find this function on my scripts :(

teffanymae commented 4 years ago

try check_range=False

Omg! You saved my life. T_T

THANK YOU VERY MUCH!!!

D4n1aLLL commented 4 years ago

I have the same problem, it would be useful to return the name of the image that fails when it raises this error

@madi how do you print the image name? I tried scouring for the function that prints the image name but couldn't find it in the repo. Any help would be appreciated!

Where to get file name?

git-hamza commented 4 years ago

try check_range=False

Oh , thanks! It seems worked! :) image

Where did you put this? I didn't find this function on my scripts :( https://github.com/tensorflow/models/issues/1754#issuecomment-456122182

pytholic commented 4 years ago

Hey guys, I had the same error and found a solution that worked for me. As my training-data I used several classes of the image-net dataset and wrote a script that also downloaded the bounding boxes. Then I used some scripts of this repo to convert the xml files to a csv(xml_to_csv.py), and after that turned it into a record-file (generate_tfrecord.py). Apparently, there are some wrong data in the image-net annotations or another issue with that. However, I wrote a little script to loads every image I had in my csv-file and checked it against the height/width info from the xml file. In my case there were a couple of wrong annotated images and after removing them from my csv, the training was successful. In case it helps anyone I uploaded my image-checking scirpt here. You will probably have a different workflow, but I just wanted to highlight that you should better doublecheck your annotation data if you have downloaded it from somewhere.

Cheers

Thank you for this mate!!! This works guys!

CARASO commented 4 years ago

try check_range=False

Omg! You saved my life. T_T

THANK YOU VERY MUCH!!!

Welcome🤣

sainisanjay commented 4 years ago

try check_range=False

@CARASO do you think that putting check_range=False will degrade the performance?

iamrishab commented 4 years ago

Before creating the TF record file, I changed my ground truth like this which solved my problem. So now the minimum is (2, 2) other than (0,0) and the maximum is (w-2, h-2). You can change it to 1 and it should also work.

h, w = img.shape[:2]
xmin = max(2, x1)
ymin = max(2, y1)
xmax = min(w-2, x2)
ymax = min(h-2, y2)
MISSIVA20 commented 4 years ago

je voulais savoir est ce que quand je relance la formation le modèle ça ne posera pas de problème vu que j'ai pas pu résolu le problème a chaque fois j'ai la même erreur

AlvaroCavalcante commented 2 years ago

I had this problem and solved it as @rmekdma suggested, correcting the difference in the image resolution when the image is read. It turns out that some images have the width and height changed because of the metadata of the file. To prevent this, in my generate_tfrecord.py script I changed how the image is opened from this:

image = Image.open(encoded_jpg_io)

to this:

from PIL import Image, ImageOps
image = ImageOps.exif_transpose(Image.open(encoded_jpg_io))

And it solved the problem.

iaverypadberg commented 2 years ago

try check_range=False

@CARASO do you think that putting check_range=False will degrade the performance?

If there are a lot if instances where the bounding box is in the wrong spot, then yah the performance will definitely suffer. If its just 1 image it shouldn't be too much of an issue.