Closed ericj974 closed 7 years ago
Hi, have you find what caused this error? Have the same error here
update fixed: In my case i've mixed the groundtruth regions (height and width were exchanged)
Could you please verify that your ground truth regions are set correctly?
Hi, something similar on my side, after looking twice at my ground truth regions, some of them had their box coordinates greater than either the image width or height.
I have similar error: https://pastebin.com/3KrSaEAF but I check my bbox files and is all ok. Besides, I delete de files and I replace by other and the error continues. What I can do?
I'm having the same error when I use any of the following: _faster_rcnn_inception_resnet_v2_atrous_coco rfcn_resnet101coco
But NOT when I use: _ssd_inception_v2_coco ssd_mobilenet_v1coco
My training images are a mixture of 300x300 and 450x450 pixels. I don't believe any of my bounding boxes are outside the image coordinates. Even if that's the case why would the last two models work but not the resnet models?
I'm having the same issue when I train either of the faster_rcnn models and the tfcn_resnet model but not with either of the ssd models.
Let me know if you find a solution. I've been struggling with this for 3 days now and I have no clue why this is happening.
Is there anyone solve this problem? I also can not use faster_rcnn_inception_resnet_v2_atrous_coco
and rfcn_resnet101_coco
The error is:
2017-10-25 21:23:15.296232: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: assertion failed: [maximum box coordinate value is larger than 1.01: ] [1.0140625] [[Node: ToAbsoluteCoordinates/Assert/AssertGuard/Assert = Assert[T=[DT_STRING, DT_FLOAT], summarize=3, _device="/job:localhost/replica:0/task:0/cpu:0"](ToAbsoluteCoordinates/Assert/AssertGuard/Assert/Switch/_167, ToAbsoluteCoordinates/Assert/AssertGuard/Assert/data_0, ToAbsoluteCoordinates/Assert/AssertGuard/Assert/Switch_1/_169)]] 2017-10-25 21:23:15.296248: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: assertion failed: [maximum box coordinate value is larger than 1.01: ] [1.0140625] [[Node: ToAbsoluteCoordinates/Assert/AssertGuard/Assert = Assert[T=[DT_STRING, DT_FLOAT], summarize=3, _device="/job:localhost/replica:0/task:0/cpu:0"](ToAbsoluteCoordinates/Assert/AssertGuard/Assert/Switch/_167, ToAbsoluteCoordinates/Assert/AssertGuard/Assert/data_0, ToAbsoluteCoordinates/Assert/AssertGuard/Assert/Switch_1/_169)]]
Actually, I just ignore this error, change the ./core/box_list_ops.py
max_assert = tf.Assert(tf.greater_equal(1.1, box_maximum), ['maximum box coordinate value is larger ' 'than 1.1: ', box_maximum])
1.01 to 1.1, as my error max value is 1.014.
By this way to make the code keep on. Seems the result is not affected.
I have the same problem, it would be useful to return the name of the image that fails when it raises this error
try check_range=False
@yinggo @CARASO thanks for the valuable info. But after doing the appropriate changes now I get the following error:
InvalidArgumentError (see above for traceback): assertion failed: []
Condition x == y did not hold element-wise:] [x (Loss/BoxClassifierLoss/assert_equal_2/x:0) = ] [0]
[y (Loss/BoxClassifierLoss/assert_equal_2/y:0) = ] [16]
[[Node: Loss/BoxClassifierLoss/assert_equal_2/Assert/Assert = Assert[T=[DT_STRING, DT_STRING, DT_STRING, DT_INT32, DT_STRING, DT_INT32], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Loss/BoxClassifierLoss/assert_equal_2/All/_2891, Loss/BoxClassifierLoss/assert_equal_1/Assert/Assert/data_0, Loss/BoxClassifierLoss/assert_equal_1/Assert/Assert/data_1, Loss/BoxClassifierLoss/assert_equal_2/Assert/Assert/data_2, Loss/BoxClassifierLoss/assert_equal_2/x/_2893, Loss/BoxClassifierLoss/assert_equal_2/Assert/Assert/data_4, Loss/BoxClassifierLoss/ones_1/packed/_99)]]
Any suggestions or hints?
@kirk86 Yes, I also found that if ‘check_range = false’ there are still other errors. Now I can only use the dichotomy to troubleshoot data sets. Hope that other friends have better solutions.
@CARASO Hi, sorry for bothering I was just wondering did you also tried in addition what @yinggo suggested? Did you still got the same error?
@kirk86 Yes, I tried his method, but it didn't help me. Then I used the suggestion for you. I think the direction of the two of us is the same. We are all ignoring this cross-border mistake. But then I found this direction wrong, because it will only make the mistake erupt later. At the moment, I compressed the 4000+ dataset to only 600+, and I was able to train normally. Afterwards, I tried to incrementally add data and try to find out the problem data set. There is no better plan at this time. I hope you or other friends can tell me when they have better suggestions.
Hey guys, I had the same error and found a solution that worked for me. As my training-data I used several classes of the image-net dataset and wrote a script that also downloaded the bounding boxes. Then I used some scripts of this repo to convert the xml files to a csv(xml_to_csv.py), and after that turned it into a record-file (generate_tfrecord.py). Apparently, there are some wrong data in the image-net annotations or another issue with that. However, I wrote a little script to loads every image I had in my csv-file and checked it against the height/width info from the xml file. In my case there were a couple of wrong annotated images and after removing them from my csv, the training was successful. In case it helps anyone I uploaded my image-checking scirpt here. You will probably have a different workflow, but I just wanted to highlight that you should better doublecheck your annotation data if you have downloaded it from somewhere.
Cheers
@asturm0 Yes, I also used similar methods to filter the data set, but unfortunately, the filtered data set still has an exception. I had to manually delete more than 1,500 photos. At present, 12000+ steps were trained and no abnormalities were found. I share the script that filters datasets here, I hope to be useful to other partners, and someone can give a more perfect solution.
Cheers
This error also arises when the bounding boxes are two small compared to the size of the image. I deleted all boxes that are less than 1/16 th of the image size and the training works fine. Has anyone tried this before? Is there a specific proportion (instead of 1/16th) that we can take that is tied to how fasterRCNN is implemented?
@tvkpz Are you saying area? Or is it length or width?
area
if object_area <= (width * height) / (16 * 16): raise Exception('object too small Error')
After filtering the data set above, I have trained over 450,000 steps to see no exceptions.
Size of the area is dependent on some of the parameters (size of anchor etc) of fasterRCNN algorithm. Wondering if anyone knows the connection. I have not yet looked into it.
In my case few images gave different width and height on opencv and PIL. (w, h are exchanged for some reason)
I added lines to create_tf_record.py
width, height = image.size
if int(data['size']['width']) != width or int(data['size']['height']) != height:
continue
Thanks @asturm0 . Your script helped me detect that one of my images had a wrong annotation (a box fully outside of the image - used labelimg).
My issue was that for some of my images, the height listed in my pascal xml was actually the width dimension and vice versa. This maybe because I used an out of date vatic build. Hopefully this helps someone
try
check_range=False
Dude, life saver
try
check_range=False
Dude, life saver
@Sibozhu Where did you change this?
I have the same problem, it would be useful to return the name of the image that fails when it raises this error
@madi how do you print the image name? I tried scouring for the function that prints the image name but couldn't find it in the repo. Any help would be appreciated!
I have the same erro: InvalidArgumentError (see above for traceback): assertion failed: [maximum box coordinate value is larger than 1.100000: ] [1.1015625]
with "mobilenet_v2_1.0_128". It accures after saving checkpoints for first time.
It is not because ground truth are wrong, it seems algorithm does not learn well first.
@CARASO thanks ,your script helped me to identify the mistake
Use this size checker . Annotation box should not be too small https://github.com/EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10/blob/master/sizeChecker.py
try
check_range=False
which file ?
/software/models/research/object_detection/core/box_list_ops.py
It seems that the training data has some bad annotations. Rather than changing on the core file as suggested by @yinggo , I think it would be better if you could change the bbox coordnates, if possible, while preparing the dataset. Example: if larger by 1.04 and 1.08, then just reduce the xmin, xmax, ymin and ymax by 1.5 or 2 and then normalize by width and height.
try
check_range=False
Oh , thanks! It seems worked! :)
try
check_range=False
Oh , thanks! It seems worked! :)
Where did you put this? I didn't find this function on my scripts :(
try
check_range=False
Omg! You saved my life. T_T
THANK YOU VERY MUCH!!!
I have the same problem, it would be useful to return the name of the image that fails when it raises this error
@madi how do you print the image name? I tried scouring for the function that prints the image name but couldn't find it in the repo. Any help would be appreciated!
Where to get file name?
try
check_range=False
Oh , thanks! It seems worked! :)
Where did you put this? I didn't find this function on my scripts :( https://github.com/tensorflow/models/issues/1754#issuecomment-456122182
Hey guys, I had the same error and found a solution that worked for me. As my training-data I used several classes of the image-net dataset and wrote a script that also downloaded the bounding boxes. Then I used some scripts of this repo to convert the xml files to a csv(xml_to_csv.py), and after that turned it into a record-file (generate_tfrecord.py). Apparently, there are some wrong data in the image-net annotations or another issue with that. However, I wrote a little script to loads every image I had in my csv-file and checked it against the height/width info from the xml file. In my case there were a couple of wrong annotated images and after removing them from my csv, the training was successful. In case it helps anyone I uploaded my image-checking scirpt here. You will probably have a different workflow, but I just wanted to highlight that you should better doublecheck your annotation data if you have downloaded it from somewhere.
Cheers
Thank you for this mate!!! This works guys!
try
check_range=False
Omg! You saved my life. T_T
THANK YOU VERY MUCH!!!
Welcome🤣
try
check_range=False
@CARASO do you think that putting check_range=False
will degrade the performance?
Before creating the TF record file, I changed my ground truth like this which solved my problem. So now the minimum is (2, 2) other than (0,0) and the maximum is (w-2, h-2). You can change it to 1 and it should also work.
h, w = img.shape[:2]
xmin = max(2, x1)
ymin = max(2, y1)
xmax = min(w-2, x2)
ymax = min(h-2, y2)
je voulais savoir est ce que quand je relance la formation le modèle ça ne posera pas de problème vu que j'ai pas pu résolu le problème a chaque fois j'ai la même erreur
I had this problem and solved it as @rmekdma suggested, correcting the difference in the image resolution when the image is read. It turns out that some images have the width and height changed because of the metadata of the file. To prevent this, in my generate_tfrecord.py script I changed how the image is opened from this:
image = Image.open(encoded_jpg_io)
to this:
from PIL import Image, ImageOps
image = ImageOps.exif_transpose(Image.open(encoded_jpg_io))
And it solved the problem.
try
check_range=False
@CARASO do you think that putting
check_range=False
will degrade the performance?
If there are a lot if instances where the bounding box is in the wrong spot, then yah the performance will definitely suffer. If its just 1 image it shouldn't be too much of an issue.
System information
During training (rfcn architecture), the following error appears after some random steps:
InvalidArgumentError (see above for traceback): assertion failed: [maximum box coordinate value is larger than 1.01: ] [1.0111111]
Which happens to be thrown by box_list_ops.to_normalized_coordinates after a failed assertion.I have to restart from the latest checkpoint to continue the training, however the same type of error can be thrown again.
Log can be found here: https://gist.github.com/ericj974/f270855bf6368509c74c05e94b6cb7b8 Config file here: https://gist.github.com/ericj974/8af390e1841b4f9be463b70573dc17d1