Closed Philip-Chen closed 6 years ago
I face the same error, and I really need help about how to solve it.
Me too.Has anyone solved it?
If you run without the checkpoint do you still get the assertion errors?
Hi @robieta , What do you mean by running without the checkpoint? Do you mean that I should set 'from_detection_checkpoint:' to 'false' in the configuration file?
When I did this, I got other errors.
Could you pls clarify?
What are the errors that you get when from_detection_checkpoint to false?
Hi @robieta, When I set from_detection_checkpoint to false (mask_rcnn_inception_resnet_v2_atrous_coco), I got the following erros:
EDIT: (robieta) Moved full output to a separate file obj_detection_output2.txt
C:\Users\hedey\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\gradients_impl.py:97: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. " WARNING:root:Variable [InceptionResnetV2/Block8/Branch_0/Conv2d_1x1/BatchNorm/beta] is not available in checkpoint
...
WARNING:root:Variable [InceptionResnetV2/Repeat_2/block8_9/Conv2d_1x1/weights/Momentum] is not available in checkpoint
Traceback (most recent call last):
File "train.py", line 167, in
Do not use checkpoint。like this
#fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt"
from_detection_checkpoint: false
you can try
Hi @lulu12132017 ,
Now, I get the following errors:
EDIT: (robieta) Moved full output to a separate file obj_detection_output3.txt
INFO:tensorflow:Error reported to Coordinator: assertion failed: [] [Condition x == y did not hold element-wise:] [x (Loss/BoxClassifierLoss/assert_equal_2/x:0) = ] [0] [y (Loss/BoxClassifierLoss/assert_equal_2/y:0) = ] [1]
...
InvalidArgumentError (see above for traceback): assertion failed: [] [Condition x == y did not hold element-wise:] [x (Loss/BoxClassifierLoss/assert_equal_2/x:0) = ] [0] [y (Loss/BoxClassifierLoss/assert_equal_2/y:0) = ] [1] [[Node: Loss/BoxClassifierLoss/assert_equal_2/Assert/Assert = Assert[T=[DT_STRING, DT_STRING, DT_STRING, DT_INT32, DT_STRING, DT_INT32], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Loss/BoxClassifierLoss/assert_equal_2/All/_133, Loss/RPNLoss/assert_equal/Assert/Assert/data_0, Loss/RPNLoss/assert_equal/Assert/Assert/data_1, Loss/BoxClassifierLoss/assert_equal_2/Assert/Assert/data_2, Loss/BoxClassifierLoss/assert_equal_2/x/_135, Loss/BoxClassifierLoss/assert_equal_2/Assert/Assert/data_4, Loss/RPNLoss/ones_1/shape/_137)]] [[Node: FirstStageFeatureExtractor/InceptionResnetV2/Mixed_5b/Branch_3/Conv2d_0b_1x1/BatchNorm/beta/read/_305 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_2367_FirstStageFeatureExtractor/InceptionResnetV2/Mixed_5b/Branch_3/Conv2d_0b_1x1/BatchNorm/beta/read", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
Hi @lulu12132017 & @robieta,
I really need your help to get a solution for this, because I need to use the tensorflow object detection API in my master's project.
I'm going to close this and refer you to the tensorflow StackOverflow, as this appears to be a configuration issue rather than a clear bug in the object detection code.
If you think we've misinterpreted a bug, please comment again with a clear explanation, as well as all of the information requested in the issue template. Thanks!
Although the issue is closed by Robieta, the solution isn't available anywhere. There are multiple bugs on this issue with no suggestion what the configuration is and what is the real way of solving this. Please help.
Hi @SarvMangal, I agree with you. We need help by getting a real way of solving this. Even after I followed @robieta's advice and posted at StackOverflow, I haven't received any replies yet. Here is my Stackoverflow post: https://stackoverflow.com/questions/50009709/assertion-failed-error-when-using-tensorflow-object-detection-api-to-fine-tune-t
Isn't there any way of reopening this thread? Or I will add one more issue with all the required details.
Even if it is a configuration issue, the documentation is just not enough to help us solve the problem.
On Tue 8 May, 2018, 2:05 AM hedeya1980, notifications@github.com wrote:
Hi @SarvMangal https://github.com/SarvMangal, I agree with you. We need help by getting a real way of solving this. Even after I followed @robieta https://github.com/robieta's advice and posted at StackOverflow, I haven't received any replies yet. Here is my Stackoverflow post:
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/models/issues/3972#issuecomment-387196983, or mute the thread https://github.com/notifications/unsubscribe-auth/AUNFigk--1MPYemBxoLQrVF3s8PsxYJxks5twLAngaJpZM4TTVyi .
When you convert the MIO-TCD dataset into TFRecord,you should set include_masks parameter like this. --include_masks=True You can try.
在 2018-05-08 04:35:51,"hedeya1980" notifications@github.com 写道:
Hi @SarvMangal, I agree with you. We need help by getting a real way of solving this. Even after I followed @robieta's advice and posted at StackOverflow, I haven't received any replies yet. Here is my Stackoverflow post: https://stackoverflow.com/questions/50009709/assertion-failed-error-when-using-tensorflow-object-detection-api-to-fine-tune-t
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Hi @lulu12132017 , Thanks for your reply. However, could you pls clarify the following:
Does this require my dataset to have masks data? I'm working on the MIO-TCD dataset and it doesn't have any masks data.
the function that I defined to create a tf_example doesn't include include_masks parameter, so I'm not clear about where I should set the include_masks parameter.
I have same issue
I have created TFRecord files by using create_pet_tf_record.py
now I am trying to train my dateset with mask_rcnn
but I am getting same issue. Is there new suggestion please ?
@hedeya1980 I could not post my answer in your question in stackoverflow
I had this problem, I solved as follow:
The name of the TFRecords files should be pet_train/val.record
. I changed it by editing the faces_only
from True
to False
check the line here https://github.com/tensorflow/models/blob/master/research/object_detection/dataset_tools/create_pet_tf_record.py#L49
Then, I regenerated TFRecord files by this
python object_detection/dataset_tools/create_pet_tf_record.py
--label_map_path=object_detection/data/two_label_map.pbtxt
--data_dir=`pwd` --output_dir=`pwd` --include_masks=True
Then, I got two TFRecords files with names pet_train/val.record, then I used them for training process with mask_rcnn_inception_v2_coco
Hope this helps
I have this issue only when I use TFRecord files generated by create_pascal_tf_record.py
. I don't have it when I use TFRecord files generated by create_pet_tf_record.py
as I mentioned earlier. Is there any update?
when i set faces_only
from True
to False
it's solved
what's faces_only means ?
I am still getting this error on this issue?.Has anybody figured this out yet?
NotFoundError (see above for traceback): Key Conv/biases/Momentum not found in checkpoint [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
faces_only means we display only box on faces not on whole body, and no segmentation is made
The same error on all datasets and all mask models
System information
(tensorflow) philip_chen@Chen-Lenovo:~/TensorFlow/models/research$ CUDA_VISIBLE_DEVICES=1 python object_detection/train.py --logtostderr --pipeline_config_path=/home/philip_chen/TensorFlow/models/research/object_detection/mask_rcnn_inception_v2_coco_2018_01_28/mask_rcnn_inception_v2_coco.config --train_dir=/home/philip_chen/TensorFlow/models/research/object_detection/mask_rcnn_inception_v2_coco_2018_01_28/train
EDIT: (robieta) Moved full output to a separate file obj_detection_output.txt
/home/philip_chen/anaconda3/envs/tensorflow/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters INFO:tensorflow:Scale of 0 disables regularizer. INFO:tensorflow:Scale of 0 disables regularizer. INFO:tensorflow:Scale of 0 disables regularizer.
...
InvalidArgumentError (see above for traceback): assertion failed: [] [Condition x == y did not hold element-wise:] [x (Loss/BoxClassifierLoss/assert_equal_2/x:0) = ] [0] [y (Loss/BoxClassifierLoss/assert_equal_2/y:0) = ] [2] [[Node: Loss/BoxClassifierLoss/assert_equal_2/Assert/Assert = Assert[T=[DT_STRING, DT_STRING, DT_STRING, DT_INT32, DT_STRING, DT_INT32], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Loss/BoxClassifierLoss/assert_equal_2/All, Loss/RPNLoss/assert_equal/Assert/Assert/data_0, Loss/RPNLoss/assert_equal/Assert/Assert/data_1, Loss/BoxClassifierLoss/assert_equal_2/Assert/Assert/data_2, Loss/BoxClassifierLoss/assert_equal_2/x, Loss/BoxClassifierLoss/assert_equal_2/Assert/Assert/data_4, Loss/RPNLoss/ones_1/packed)]]