tensorflow / models

Models and examples built with TensorFlow
Other
77.05k stars 45.77k forks source link

[Object detection] Mismatch between pretrained model and configs for ssd_mobilenet_v2_coco #5315

Closed yimintsai closed 4 years ago

yimintsai commented 6 years ago

System information

Describe the problem

From object detection model zoo, the pretrained ssd_mobilenet_v2_coco (ssd_mobilenet_v2_coco_2018_03_29) has different network structure as that described in samples/configs/ssd_mobilenet_v2_coco.config. The pretrained ssd_mobilenet_v2_coco_2018_03_29/pipeline.config uses depthwise conv for feature extractor, which is similar to ssdlite. However, the samples/configs/ssd_mobilenet_v2_coco.config uses standard conv2d for feature extractor. This mismatch leads to incompatible shape while using ssd_mobilenet_v2_coco_2018_03_29 as the initial finetune ckpt.

Source code / logs

WARNING:root:Variable [FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_2_3x3_s2_512/weights] is available in checkpoint, but has an incompatible shape with model variable. WARNING:root:Variable [FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_3_3x3_s2_256/weights] is available in checkpoint, but has an incompatible shape with model variable. WARNING:root:Variable [FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_4_3x3_s2_256/weights] is available in checkpoint, but has an incompatible shape with model variable. WARNING:root:Variable [FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/weights] is available in checkpoint, but has an incompatible shape with model variable.

tensorflowbutler commented 6 years ago

Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks. Exact command to reproduce

yimintsai commented 6 years ago

@tensorflowbutler updated.

tanndx17 commented 6 years ago

I also encountered the problem Mismatch between pretrained model and configs for ssd_mobilenet_v1_0.75_depth_300x300_coco14_sync

Source code / logs: WARNING:tensorflow:Estimator's model_fn (<function create_model_fn..model_fn at 0x7fb15d3b5bf8>) includes params argument, but params are not passed to Estimator. WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards. WARNING:tensorflow:From /data/tf-projects/tf-models/research/object_detection/core/preprocessor.py:1205: calling squeeze (from tensorflow.python.ops.array_ops) with squeeze_dims is deprecated and will be removed in a future version. Instructions for updating: Use the axis argument instead WARNING:root:Variable [MobilenetV1/Conv2d_0/BatchNorm/beta] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_0/BatchNorm/gamma] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_0/BatchNorm/moving_mean] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_0/BatchNorm/moving_variance] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_0/weights] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_10_depthwise/BatchNorm/beta] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_10_depthwise/BatchNorm/gamma] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_10_depthwise/BatchNorm/moving_mean] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_10_depthwise/BatchNorm/moving_variance] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_10_depthwise/depthwise_weights] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_10_pointwise/BatchNorm/beta] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_10_pointwise/BatchNorm/gamma] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_10_pointwise/BatchNorm/moving_mean] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_10_pointwise/BatchNorm/moving_variance] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_10_pointwise/weights] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_11_depthwise/BatchNorm/beta] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_11_depthwise/BatchNorm/gamma] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_11_depthwise/BatchNorm/moving_mean] is not available in checkpoint ... ... WARNING:root:Variable [MobilenetV1/Conv2d_9_depthwise/BatchNorm/gamma] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_9_depthwise/BatchNorm/moving_mean] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_9_depthwise/BatchNorm/moving_variance] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_9_depthwise/depthwise_weights] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_9_pointwise/BatchNorm/beta] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_9_pointwise/BatchNorm/gamma] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_9_pointwise/BatchNorm/moving_mean] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_9_pointwise/BatchNorm/moving_variance] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_9_pointwise/weights] is not available in checkpoint

oliviaolivia700 commented 6 years ago

@yimintsai same issue. same error message /_\

wishinger-li commented 5 years ago

So have you solved the problem???

oliviaolivia700 commented 5 years ago

i simply used another model

wishinger-li commented 5 years ago

@yimintsai @tanndx17 have you solved the problem???

wishinger-li commented 5 years ago

@oliviaolivia700 could you tell me which model you switch?

kodai3 commented 5 years ago

@wishinger-li have you found any solution or walk-around ? facing same problem and fails on every "v2" model on model zoo

oliviaolivia700 commented 5 years ago

@oliviaolivia700 could you tell me which model you switch?

ssd inception v2 coco

sudheerExperiments commented 5 years ago

I also encountered the problem Mismatch between pretrained model and configs for ssd_mobilenet_v1_0.75_depth_300x300_coco14_sync

Source code / logs: WARNING:tensorflow:Estimator's model_fn (<function create_model_fn..model_fn at 0x7fb15d3b5bf8>) includes params argument, but params are not passed to Estimator. WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards. WARNING:tensorflow:From /data/tf-projects/tf-models/research/object_detection/core/preprocessor.py:1205: calling squeeze (from tensorflow.python.ops.array_ops) with squeeze_dims is deprecated and will be removed in a future version. Instructions for updating: Use the axis argument instead WARNING:root:Variable [MobilenetV1/Conv2d_0/BatchNorm/beta] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_0/BatchNorm/gamma] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_0/BatchNorm/moving_mean] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_0/BatchNorm/moving_variance] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_0/weights] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_10_depthwise/BatchNorm/beta] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_10_depthwise/BatchNorm/gamma] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_10_depthwise/BatchNorm/moving_mean] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_10_depthwise/BatchNorm/moving_variance] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_10_depthwise/depthwise_weights] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_10_pointwise/BatchNorm/beta] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_10_pointwise/BatchNorm/gamma] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_10_pointwise/BatchNorm/moving_mean] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_10_pointwise/BatchNorm/moving_variance] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_10_pointwise/weights] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_11_depthwise/BatchNorm/beta] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_11_depthwise/BatchNorm/gamma] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_11_depthwise/BatchNorm/moving_mean] is not available in checkpoint ... ... WARNING:root:Variable [MobilenetV1/Conv2d_9_depthwise/BatchNorm/gamma] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_9_depthwise/BatchNorm/moving_mean] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_9_depthwise/BatchNorm/moving_variance] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_9_depthwise/depthwise_weights] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_9_pointwise/BatchNorm/beta] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_9_pointwise/BatchNorm/gamma] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_9_pointwise/BatchNorm/moving_mean] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_9_pointwise/BatchNorm/moving_variance] is not available in checkpoint WARNING:root:Variable [MobilenetV1/Conv2d_9_pointwise/weights] is not available in checkpoint

Having the same issue. Did anybody manage to find a solution for this?

knumat commented 5 years ago

@tanndx17 and @sudheerExperiments The warning message in the original issue said "is available in checkpoint, but has an incompatible shape with model variable." while your warning message said "is not available in checkpoint". It sounds like you are trying to load a checkpoint that is completely incompatible with your model, while the original issue describes a checkpoint that has a parameter size mismatch (3x3 vs 1x1). I would suggest opening a separate issue if you are still having trouble.

knumat commented 5 years ago

I've seen the same issue as @yimintsai when attempting to use the non-quantized MobileNet v2 COCO checkpoint from the model zoo (ssd_mobilenet_v2_coco_2018_03_29.tar.gz). I switched to the quantized version (ssd_mobilenet_v2_quantized_300x300_coco_2019_01_03.tar.gz) and the warning messages went away. Here are the detailed warning messages I saw:

W0624 17:13:19.480168 140443818018688 variables_helper.py:149] Variable [FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_2_3x3_s2_512/weights] is available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[1, 1, 256, 512]], model variable shape: [[3, 3, 256, 512]]. This variable will not be initialized from the checkpoint. W0624 17:13:19.480344 140443818018688 variables_helper.py:149] Variable [FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_3_3x3_s2_256/weights] is available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[1, 1, 128, 256]], model variable shape: [[3, 3, 128, 256]]. This variable will not be initialized from the checkpoint. W0624 17:13:19.480508 140443818018688 variables_helper.py:149] Variable [FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_4_3x3_s2_256/weights] is available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[1, 1, 128, 256]], model variable shape: [[3, 3, 128, 256]]. This variable will not be initialized from the checkpoint. W0624 17:13:19.480628 140443818018688 variables_helper.py:149] Variable [FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/weights] is available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[1, 1, 64, 128]], model variable shape: [[3, 3, 64, 128]]. This variable will not be initialized from the checkpoint.

Not positive, but it seems like this issue may be caused by the following commit (changed kernel from 3 to 1 and removed use_depthwise). This commit occurred after the model zoo checkpoint was generated (2018-03-29) and before the 1.12.0 release, so this issue probably affects everyone trying to use this model zoo checkpoint. https://github.com/tensorflow/models/commit/324d6dc3b52d7219d4ef48b1e4b7f9e4086f11fd#diff-1976a26347451b0de26f1a11dcf376c9L56

I think the model zoo checkpoint needs to be regenerated to fix this issue. In the mean time, people should consider using the quantized version instead (ssd_mobilenet_v2_quantized_coco).

knumat commented 5 years ago

I seem to have found a better workaround for this issue. Instead of trying to use ssd_mobilenet_v2_coco.config in the tensorflow/models Git repo, I just used the pipeline.config file that is included with the pre-trained checkpoint.

Besides the normal settings that need changing, you also need to remove the batch_norm_trainable flag from pipeline.config, since that flag doesn't exist in the released version of TensorFlow. More info on that issue is here: https://github.com/tensorflow/models/issues/4066

Of course, the proper fix is to regenerate the ssd_mobilenet_v2_cocopre-trained checkpoint on the Model Zoo so that it is compatible with the ssd_mobilenet_v2_coco.config file in the tensorflow/models Git repo (v1.12 and later).

On the other hand, this whole problem might go away when object detection support is ported to Tensorflow 2.0, which seems to be focused on the Python-based Keras API instead of using config files to setup the pipeline. Per the following issue, it sounds like that "may take months": https://github.com/tensorflow/models/issues/6423

tensorflowbutler commented 4 years ago

Hi There, We are checking to see if you still need help on this, as this seems to be considerably old issue. Please update this issue with the latest information, code snippet to reproduce your issue and error you are seeing. If we don't hear from you in the next 7 days, this issue will be closed automatically. If you don't need help on this issue any more, please consider closing this.

amahendrakar commented 4 years ago

Automatically closing due to lack of recent activity. Please update the issue when new information becomes available, and we will reopen the issue. Thanks!

Abiganesh commented 1 year ago

I'm trying to do transfer learning on ssd_mobilenet_v2_quantized_coco, I have changed all the neccessary variables and location as mentioned in tf object detection api 1.14. but i'm getting reshape error, my total number of class is 1 . and if i change num classes to 90, that starts training fine. tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found. (0) Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [12] rhs shape= [546] [[{{node save/Assign_87}}]] [[save/RestoreV2/_2442]] (1) Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [12] rhs shape= [546] [[{{node save/Assign_87}}]] 0 successful operations. 0 derived errors ignored. tensorflow object detection api 1