Open veonua opened 4 years ago
@veonua
Request you to share complete code snippet or steps to reproduce the issue in our environment.It helps us in localizing the issue faster.Thanks!
@ravikyram https://gist.github.com/veonua/e4186c92df80b49ad3d813f1219d0727
I'm using latest master version of object detection API
object_detection/model_main_tf2.py --model_dir=./output --pipeline_config_path=checkpoint/pipeline.config --num_train_steps=1000
object_detection/exporter_main_v2.py --input_type=image_tensor --trained_checkpoint_dir="./output" --output_directory="./model" --pipeline_config_path=checkpoint/pipeline.config
please let me know if you need any more information
I get the same error with any model. Attached (TF2 Error.txt) is the terminal output with ssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03 (also tried bunch of other models including efficientdet_d0_coco17_tpu-32) Here's what I run:
python model_main_tf2.py \
--pipeline_config_path=training/ssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03.config \
--model_dir=training/ \
--alsologtostderr
I get the AssertionError, followed by tons of warnings related to weight loading.
AssertionError: Some Python objects were not bound to checkpointed values, likely due to changes in the Python program:
The platform I use:
I've tried on a brand new machine with fresh installation as well, the issue is persistant.
Same Error:
AssertionError: Some Python objects were not bound to checkpointed values, likely due to changes in the Python program:
same error. Bump. It happens to me when loading CenterNet
as the temporary solution, I've removed
status.assert_existing_objects_matched()
The model seems to be working.
same error. It happens to me when loading CenterNet_ResNet50_v1
Any updates on this? I'm getting the same error with every TF2 model I've tried.
Likely a bug in TF 2.3.0
Change the line in pipeline.config
fine_tune_checkpoint_type: "classification" to fine_tune_checkpoint_type: "detection"
Change the line in pipeline.config
fine_tune_checkpoint_type: "classification" to fine_tune_checkpoint_type: "detection"
For those who thumbed down this answer, can you provide some feedback as to why this is not the solution? I'm guessing if you run fine-tuning training again with this flag, then export the object detection model, it should work.
I just tested this on the faster_rcnn_resnet50_v1_1024x1024_coco17_tpu-8.config
model.
First try I trained it with fine_tune_checkpoint_type: "classification"
, running exporter_main_v2.py
resulted in the AssertionError above (even if I modify the config file to be "detection" for export)
Next, I trained the model again starting from the original pretrained model with fine_tune_checkpoint_type: "detection"
, running exporter_main_v2.py
produced the saved_model correctly (using "detection" config)
Hey @khu834 , i'm also getting the same error while training the model
model {
ssd {
inplace_batchnorm_update: true
freeze_batchnorm: false
num_classes: 1
box_coder {
faster_rcnn_box_coder {
y_scale: 10.0
x_scale: 10.0
height_scale: 5.0
width_scale: 5.0
}
}
matcher {
argmax_matcher {
matched_threshold: 0.5
unmatched_threshold: 0.5
ignore_thresholds: false
negatives_lower_than_unmatched: true
force_match_for_each_row: true
use_matmul_gather: true
}
}
similarity_calculator {
iou_similarity {
}
}
encode_background_as_zeros: true
anchor_generator {
ssd_anchor_generator {
num_layers: 6
min_scale: 0.2
max_scale: 0.95
aspect_ratios: 1.0
aspect_ratios: 2.0
aspect_ratios: 0.5
aspect_ratios: 3.0
aspect_ratios: 0.3333
}
}
image_resizer {
fixed_shape_resizer {
height: 300
width: 300
}
}
box_predictor {
convolutional_box_predictor {
min_depth: 0
max_depth: 0
num_layers_before_predictor: 0
use_dropout: false
dropout_keep_probability: 0.8
kernel_size: 1
box_code_size: 4
apply_sigmoid_to_scores: false
class_prediction_bias_init: -4.6
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
random_normal_initializer {
stddev: 0.01
mean: 0.0
}
}
batch_norm {
train: true,
scale: true,
center: true,
decay: 0.97,
epsilon: 0.001,
}
}
}
}
feature_extractor {
type: 'ssd_mobilenet_v2_keras'
min_depth: 16
depth_multiplier: 1.0
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
truncated_normal_initializer {
stddev: 0.03
mean: 0.0
}
}
batch_norm {
train: true,
scale: true,
center: true,
decay: 0.97,
epsilon: 0.001,
}
}
override_base_feature_extractor_hyperparams: true
}
loss {
classification_loss {
weighted_sigmoid_focal {
alpha: 0.75,
gamma: 2.0
}
}
localization_loss {
weighted_smooth_l1 {
delta: 1.0
}
}
classification_weight: 1.0
localization_weight: 1.0
}
normalize_loss_by_num_matches: true
normalize_loc_loss_by_codesize: true
post_processing {
batch_non_max_suppression {
score_threshold: 1e-8
iou_threshold: 0.6
max_detections_per_class: 100
max_total_detections: 100
}
score_converter: SIGMOID
}
}
}
train_config: {
fine_tune_checkpoint_version: V2
fine_tune_checkpoint: "mobilenet_v2/mobilenet_v2.ckpt-1"
fine_tune_checkpoint_type: "detection"
batch_size: 96
sync_replicas: true
startup_delay_steps: 0
replicas_to_aggregate: 8
num_steps: 7500
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
ssd_random_crop {
}
}
optimizer {
momentum_optimizer: {
learning_rate: {
cosine_decay_learning_rate {
learning_rate_base: .8
total_steps: 50000
warmup_learning_rate: 0.13333
warmup_steps: 2000
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
max_number_of_boxes: 100
unpad_groundtruth_tensors: false
}
train_input_reader: {
label_map_path: "kaggle_dataset/annotations/label_map.pbtxt"
tf_record_input_reader {
input_path: "kaggle_dataset/annotations/train.record"
}
}
eval_config: {
metrics_set: "coco_detection_metrics"
use_moving_averages: false
}
eval_input_reader: {
label_map_path: "kaggle_dataset/annotations/label_map.pbtxt"
shuffle: false
num_epochs: 1
tf_record_input_reader {
input_path: "kaggle_dataset/annotations/test.record"
}
}
Above is the pipeline.config file used to train the model
Error
AssertionError: Some Python objects were not bound to checkpointed values, likely due to changes in the Python program
Using, Tensorflow - 2.4.1 Detection model - ssd_mobilenet_v2_320x320_coco17_tpu-8
Can anyone help me on fixing this, Thanks in advance.
fine_tune_checkpoint: "mobilenet_v2/mobilenet_v2.ckpt-1"
I have tested ssd_mobilenet_v2 training and export on TF 2.4.0 it has worked fine. Can you try the following?
If the fine_tuning_checkpoint and your training config are both in the 'detection' format, the export should work.
Hi,
I'm also having the same issue. I'm trying to train from scratch by commenting the fine-tuning parameters, and the same error message occurs when running exporter_main_v2_py.
I'm using:
Modifications made to the original config file:
Also, if removing status.assert_existing_objects_matched()
, I'm able to save the model, but a warning shows up:
WARNING:tensorflow:Skipping full serialization of Keras layer <object_detection.meta_architectures.center_net_meta_arch.CenterNetMetaArch object at 0x7f11e6625f70>, because it is not built. W0817 09:44:23.985459 139717209786176 save_impl.py:76] Skipping full serialization of Keras layer <object_detection.meta_architectures.center_net_meta_arch.CenterNetMetaArch object at 0x7f11e6625f70>, because it is not built.
If I try to reload the model, I'm not able to retrieve any information because it is set as a _UserObject
. Error when using model.summary()
:
AttributeError: '_UserObject' object has no attribute 'summary'
Thank you in advance.
Prerequisites
Please answer the following questions for yourself before submitting an issue.
1. The entire URL of the file you are using
https://github.com/tensorflow/models/tree/master/research/object_detection
2. Describe the bug
I'm having the OOM issue on the big models, so I tried to train a dummy model,
for a some reason resulting checkpoint files are very small ckpt-1.index = 247 bytes ckpt-1.data-00000-of-00001 = 864 bytes
Export of this dummy model fails with assertion
raise AssertionError( ("Some Python objects were not bound to checkpointed values, likely due to changes in the Python program: %s") % (list(unused_python_objects),))
3. Steps to reproduce
Train dummy faster_rcnn model without finetune checkpoint
4. Expected behavior
final checkpoint file has to be ~100 Mb like in v1. Export happens without errors
5. Additional context
6. System information