tensorflow / models

Models and examples built with TensorFlow
Other
77.02k stars 45.78k forks source link

[deeplab] deeplab with mobilenetv3 small pretrained #7869

Open mmxuan18 opened 4 years ago

mmxuan18 commented 4 years ago

deeplab/ckpt/deeplab_mnv3_small_cityscapes_trainfine/model.ckpt Traceback (most recent call last): File "/Users//DeepLearning/ml/hand/deeplab/fingertip_seg_train/deeplab/train.py", line 464, in tf.app.run() File "/Users//anaconda3/envs/mpy/lib/python3.6/site-packages/tensorflow_core/python/platform/app.py", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "/Users//anaconda3/envs/mpy/lib/python3.6/site-packages/absl/app.py", line 299, in run _run_main(main, args) File "/Users//anaconda3/envs/mpy/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main sys.exit(main(argv)) File "/Users//DeepLearning/ml/hand/deeplab/fingertip_seg_train/deeplab/train.py", line 444, in main ignore_missing_vars=True) File "/Users//DeepLearning/ml/hand/deeplab/fingertip_seg_train/deeplab/utils/train_utils.py", line 221, in get_model_init_fn ignore_missing_vars=ignore_missing_vars) File "/Users//anaconda3/envs/mpy/lib/python3.6/site-packages/tensorflow_core/contrib/framework/python/ops/variables.py", line 690, in assign_from_checkpoint (ckpt_name, str(ckpt_value.shape), str(var.get_shape()))) ValueError: Total size of new array must be unchanged for image_pooling/weights lh_shape: [(1, 1, 288, 128)], rh_shape: [(1, 1, 288, 256)]

omair50 commented 4 years ago

I am also getting this error, did you (@mlinxiang @amahendrakar ) find any solution?

aquariusjay commented 4 years ago

Command to run the model will be provided soon. Please stay tuned.

YknZhu commented 4 years ago

sorry for the late update, please use the following flags in addition to "mobilenet_v3_large_seg" for large v3 model and "mobilenet_v3_small_seg" for small.

--image_pooling_crop_size=769,769
--image_pooling_stride=4,5
--aspp_convs_filters=128
--aspp_with_concat_projection=0 
--aspp_with_squeeze_and_excitation=1 
--decoder_use_sum_merge=1 
--decoder_filters=19 
--decoder_output_is_logits=1 
--image_se_uses_qsigmoid=1 
--image_pyramid=1 
--decoder_output_stride=8
yxftju commented 4 years ago

@YknZhu

it sees that the image_pooling_stride in paper is [16,20],but you use [4,5],so what is the correct options

--image_pooling_crop_size=769,769
--image_pooling_stride=4,5
--aspp_convs_filters=128
--aspp_with_concat_projection=0 
--aspp_with_squeeze_and_excitation=1 
--decoder_use_sum_merge=1 
--decoder_filters=19 
--decoder_output_is_logits=1 
--image_se_uses_qsigmoid=1 
--image_pyramid=1 
--decoder_output_stride=8
aquariusjay commented 4 years ago

@yxftju Thanks for the question! In the paper, the image_pooling_stride [16, 20] is used for output stride = 16 model variant, while here we provide an example with output stride = 32. The final feature map resolution is two times smaller and thus we need to compensate for this effect. Finally, using image_pooling_stride = [8, 10] or [4, 5] should give you similar results in this model variant (output stride = 32).

Note that the default value of output stride (e.g., here) is 16. We need to use 32 for this case (as shown in the model zoo, the provided mobilenetv3_{small,large}_cityscapes_trainfine model variants have Eval OS = 32). We will make this clear in the following update. Thanks, again, for pointing out this issue.

titanbender commented 4 years ago

@aquariusjay @YknZhu: Thanks for sharing the pretrained deeplab cityscapes models. I'm currently working on converting the cityscape model to tf lite. Do you have any advice on how to accomplish this? The existing tf lite conversion scripts in the deep lab repo doesn't seem to work for me.

Sincerely, Johan

omair50 commented 4 years ago

Hello @aquariusjay , @YknZhu, I tried to train mobilenetv3 for pascal_voc_seg dataset. However, I realized that the provided command is specific to cityscapes as I got the following error:

logits = tf.reshape(logits, shape=[-1, num_classes]) Invalid argument: Input to reshape is a tensor with 39923750 values, but the requested shape requires a multiple of 21

39923750 is divisible by 19 (no. of classes of cityscapes) but not by 21 (no. of classes of pascal_voc_seg). I changed the --decoder_filters=19 to 21, but it generated another error:

ValueError: Total size of new array must be unchanged for decoder/feature_projection0/weights lh_shape: [(1, 1, 120, 19)], rh_shape: [(1, 1, 120, 21)]

Could you please tell the generic command valid for other datasets also?

My command is

python3 "${WORK_DIR}"/train.py \ --logtostderr \ --train_split="train" \ --model_variant="mobilenet_v3_large_seg" \ --train_crop_size="1025,1025" \ --train_batch_size=4 \ --dataset="pascal_voc_seg" \ --training_number_of_steps="${NUM_ITERATIONS}" \ --fine_tune_batch_norm=false \ --tf_initial_checkpoint="deeplab_mnv3_large_cityscapes_trainfine " \ --train_logdir="${TRAIN_LOGDIR}" \ --dataset_dir="dataset/tfrecord" \ --image_pooling_crop_size=769,769 \ --image_pooling_stride=4,5 \ --add_image_level_feature=1 \ --aspp_convs_filters=128 \ --aspp_with_concat_projection=0 \ --aspp_with_squeeze_and_excitation=1 \ --decoder_use_sum_merge=1 \ --decoder_filters=19 \ --decoder_output_is_logits=1 \ --image_se_uses_qsigmoid=1 \ --decoder_output_stride=8 \ --output_stride=32 \ --image_pyramid=1 \

aquariusjay commented 4 years ago

Hi @omair50,

You are using the Cityscapes trained checkpoint for initialization which has different number of classes. To resolve the issue, you could

  1. Do not use the Cityscapes trained checkpoint (set tf_initial_checkpoint to None and the model will be trained from scratch). or
  2. Set --initialize_last_layer=false which will not load the last layer.

Maybe you could try the second option since training model from scratch takes weeks to converge.

Cheers,

omair50 commented 4 years ago

Hello @aquariusjay, thanks for the recommendations. I followed the 2nd option, but it generated anther error: ValueError: Total size of new array must be unchanged for decoder/feature_projection0/weights lh_shape: [(1, 1, 120, 19)], rh_shape: [(1, 1, 120, 21)]

I then set --last_layers_contain_logits_only=False (i.e. just use mnv3 backbone) and this further generated the following error:

Invalid argument: logits and labels must be broadcastable: logits_size=[19961875,21] labels_size=[2101250,21] [[node semantic_merged_logits/pixel_losses (defined at ..../utils/train_utils.py:151) ]] Finally, I set --decoder_filter=21 and the training went smoothly.

Thus, for any one interested to train mnv3 on a custom dataset, these are tested settings (in addition to above mentioned): --initialize_last_layer=false --last_layers_contain_logits_only=False --decoder_filter=number of classes

aquariusjay commented 4 years ago

Hi @omair50,

Thanks for digging into the details. What you are doing is correct. Since you are using the Lite R-ASPP decoder module, we need to set the decoder_fitler = number of classes. For reference, please see Fig. 10 of MobileNet-v3 paper

Cheers,

failbetter77 commented 2 years ago

Hello @aquariusjay, thanks for the recommendations. I followed the 2nd option, but it generated anther error: ValueError: Total size of new array must be unchanged for decoder/feature_projection0/weights lh_shape: [(1, 1, 120, 19)], rh_shape: [(1, 1, 120, 21)]

I then set --last_layers_contain_logits_only=False (i.e. just use mnv3 backbone) and this further generated the following error:

Invalid argument: logits and labels must be broadcastable: logits_size=[19961875,21] labels_size=[2101250,21] [[node semantic_merged_logits/pixel_losses (defined at ..../utils/train_utils.py:151) ]] Finally, I set --decoder_filter=21 and the training went smoothly.

Thus, for any one interested to train mnv3 on a custom dataset, these are tested settings (in addition to above mentioned): --initialize_last_layer=false --last_layers_contain_logits_only=False --decoder_filter=number of classes

I have a question. when use, "initialize_last_layer=false --last_layers_contain_logits_only=False --decoder_filter=number of classes is it use pretrained weight ? or start with scratch?