Closed mxmxlwlw closed 4 years ago
I also met the similar problem. I refered to the "object_detection/samples/configs/faster_rcnn_resnet50_coco.config", just modify the "PATH_TO_BE_CONFIGURED" and save in object_detection/model/train folder, as the following:
faster_rcnn_resnet50_coco.config
At the same time, I download trained model(faster_rcnn_resnet50_coco) from the "Tensorflow detection model zoo" as the model checkpoint in "object_detection/model/train" folder. then do Running the Training Job locally by the command: python object_detection/train.py --logtostderr \ -pipeline_config_path=./object_detection/model/train/faster_rcnn_resnet50_coco.config \ --train_dir=./object_detection/model/train 2>&1 | tee log.txt
Here the error info : NotFoundError (see above for traceback): Key Conv/biases/Momentum not found in checkpoint [[Node: save_1/RestoreV2_1 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save_1/Const_0_0, save_1/RestoreV2_1/tensor_names, save_1/RestoreV2_1/shape_and_slices)]] In details, refer to the attached. log.txt
Could you help share some experience in problem solving the training locally? Thanks.
I also met the same problem when I tried to train mask rcnn model.
InvalidArgumentError (see above for traceback): assertion failed: [] [Condition x == y did not hold element-wise:] [x (Loss/BoxClassifierLoss/assert_equal_2/x:0) = ] [0] [y (Loss/BoxClassifierLoss/assert_equal_2/y:0) = ] [1]
Could anyone help me?Thanks.
+1
I had this problem, I solved as follow:
The name of the TFRecords files should be pet_train/val.record
. I changed it by editing the faces_only
from True
to False
check the line here https://github.com/tensorflow/models/blob/master/research/object_detection/dataset_tools/create_pet_tf_record.py#L49
Then, I regenerated TFRecord files by this
python object_detection/dataset_tools/create_pet_tf_record.py
--label_map_path=object_detection/data/two_label_map.pbtxt
--data_dir=`pwd` --output_dir=`pwd` --include_masks=True
Then, I got two TFRecords files with names pet_train/val.record, then I used them for training process with mask_rcnn_inception_v2_coco
Hope this helps
@mxmxlwlw did u solve your issue, I have this issue with pascal_train/val.record only. I don't have it with pet_train/val.record.
when i train, i meet too
INFO:tensorflow:Error reported to Coordinator: assertion failed: [] [Condition x == y did not hold element-wise:] [x (Loss/BoxClassifierLoss/assert_equal_2/x:0) = ] [0] [y (Loss/BoxClassifierLoss/assert_equal_2/y:0) = ] [2]
[[Node: Loss/BoxClassifierLoss/assert_equal_2/Assert/Assert = Assert[T=[DT_STRING, DT_STRING, DT_STRING, DT_INT32, DT_STRING, DT_INT32], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Loss/BoxClassifierLoss/assert_equal_2/All/_139, Loss/RPNLoss/assert_equal/Assert/Assert/data_0, Loss/RPNLoss/assert_equal/Assert/Assert/data_1, Loss/BoxClassifierLoss/assert_equal_2/Assert/Assert/data_2, Loss/BoxClassifierLoss/assert_equal_2/x/_141, Loss/BoxClassifierLoss/assert_equal_2/Assert/Assert/data_4, Loss/RPNLoss/ones_1/packed/_143)]]
Caused by op u'Loss/BoxClassifierLoss/assert_equal_2/Assert/Assert', defined at:
File "train.py", line 167, in <module>
tf.app.run()
who solved?
I think you need check your data. I did so and this issue was solved
@wxianfeng For sure the problem is in the data. Your tf.record
don't have the mask information. Below are the three mistakes that I have corrected.
tf.record
, change the faces_only from True
to False
.@epratheeban yeah, when i set faces_only to False, tf record file is larger than True
i train success, but when i predict, not success, train data just 200, because not enough ?
Hi There, We are checking to see if you still need help on this, as this seems to be considerably old issue. Please update this issue with the latest information, code snippet to reproduce your issue and error you are seeing. If we don't hear from you in the next 7 days, this issue will be closed automatically. If you don't need help on this issue any more, please consider closing this.
Here's the error info:
run with
pipeline script is
pretrain model is faster_rcnn_resnet101_coco_11_06_2017