Open mmxuan18 opened 5 years ago
I am also getting this error, did you (@mlinxiang @amahendrakar ) find any solution?
Command to run the model will be provided soon. Please stay tuned.
sorry for the late update, please use the following flags in addition to "mobilenet_v3_large_seg" for large v3 model and "mobilenet_v3_small_seg" for small.
--image_pooling_crop_size=769,769
--image_pooling_stride=4,5
--aspp_convs_filters=128
--aspp_with_concat_projection=0
--aspp_with_squeeze_and_excitation=1
--decoder_use_sum_merge=1
--decoder_filters=19
--decoder_output_is_logits=1
--image_se_uses_qsigmoid=1
--image_pyramid=1
--decoder_output_stride=8
@YknZhu
it sees that the image_pooling_stride in paper is [16,20],but you use [4,5],so what is the correct options
--image_pooling_crop_size=769,769 --image_pooling_stride=4,5 --aspp_convs_filters=128 --aspp_with_concat_projection=0 --aspp_with_squeeze_and_excitation=1 --decoder_use_sum_merge=1 --decoder_filters=19 --decoder_output_is_logits=1 --image_se_uses_qsigmoid=1 --image_pyramid=1 --decoder_output_stride=8
@yxftju Thanks for the question! In the paper, the image_pooling_stride [16, 20] is used for output stride = 16 model variant, while here we provide an example with output stride = 32. The final feature map resolution is two times smaller and thus we need to compensate for this effect. Finally, using image_pooling_stride = [8, 10] or [4, 5] should give you similar results in this model variant (output stride = 32).
Note that the default value of output stride (e.g., here) is 16. We need to use 32 for this case (as shown in the model zoo, the provided mobilenetv3_{small,large}_cityscapes_trainfine model variants have Eval OS = 32). We will make this clear in the following update. Thanks, again, for pointing out this issue.
@aquariusjay @YknZhu: Thanks for sharing the pretrained deeplab cityscapes models. I'm currently working on converting the cityscape model to tf lite. Do you have any advice on how to accomplish this? The existing tf lite conversion scripts in the deep lab repo doesn't seem to work for me.
Sincerely, Johan
Hello @aquariusjay , @YknZhu, I tried to train mobilenetv3 for pascal_voc_seg dataset. However, I realized that the provided command is specific to cityscapes as I got the following error:
logits = tf.reshape(logits, shape=[-1, num_classes]) Invalid argument: Input to reshape is a tensor with 39923750 values, but the requested shape requires a multiple of 21
39923750 is divisible by 19 (no. of classes of cityscapes) but not by 21 (no. of classes of pascal_voc_seg). I changed the --decoder_filters=19 to 21, but it generated another error:
ValueError: Total size of new array must be unchanged for decoder/feature_projection0/weights lh_shape: [(1, 1, 120, 19)], rh_shape: [(1, 1, 120, 21)]
Could you please tell the generic command valid for other datasets also?
My command is
python3 "${WORK_DIR}"/train.py \ --logtostderr \ --train_split="train" \ --model_variant="mobilenet_v3_large_seg" \ --train_crop_size="1025,1025" \ --train_batch_size=4 \ --dataset="pascal_voc_seg" \ --training_number_of_steps="${NUM_ITERATIONS}" \ --fine_tune_batch_norm=false \ --tf_initial_checkpoint="deeplab_mnv3_large_cityscapes_trainfine " \ --train_logdir="${TRAIN_LOGDIR}" \ --dataset_dir="dataset/tfrecord" \ --image_pooling_crop_size=769,769 \ --image_pooling_stride=4,5 \ --add_image_level_feature=1 \ --aspp_convs_filters=128 \ --aspp_with_concat_projection=0 \ --aspp_with_squeeze_and_excitation=1 \ --decoder_use_sum_merge=1 \ --decoder_filters=19 \ --decoder_output_is_logits=1 \ --image_se_uses_qsigmoid=1 \ --decoder_output_stride=8 \ --output_stride=32 \ --image_pyramid=1 \
Hi @omair50,
You are using the Cityscapes trained checkpoint for initialization which has different number of classes. To resolve the issue, you could
Maybe you could try the second option since training model from scratch takes weeks to converge.
Cheers,
Hello @aquariusjay, thanks for the recommendations. I followed the 2nd option, but it generated anther error: ValueError: Total size of new array must be unchanged for decoder/feature_projection0/weights lh_shape: [(1, 1, 120, 19)], rh_shape: [(1, 1, 120, 21)]
I then set --last_layers_contain_logits_only=False (i.e. just use mnv3 backbone) and this further generated the following error:
Invalid argument: logits and labels must be broadcastable: logits_size=[19961875,21] labels_size=[2101250,21] [[node semantic_merged_logits/pixel_losses (defined at ..../utils/train_utils.py:151) ]] Finally, I set --decoder_filter=21 and the training went smoothly.
Thus, for any one interested to train mnv3 on a custom dataset, these are tested settings (in addition to above mentioned): --initialize_last_layer=false --last_layers_contain_logits_only=False --decoder_filter=number of classes
Hi @omair50,
Thanks for digging into the details. What you are doing is correct. Since you are using the Lite R-ASPP decoder module, we need to set the decoder_fitler = number of classes. For reference, please see Fig. 10 of MobileNet-v3 paper
Cheers,
Hello @aquariusjay, thanks for the recommendations. I followed the 2nd option, but it generated anther error: ValueError: Total size of new array must be unchanged for decoder/feature_projection0/weights lh_shape: [(1, 1, 120, 19)], rh_shape: [(1, 1, 120, 21)]
I then set --last_layers_contain_logits_only=False (i.e. just use mnv3 backbone) and this further generated the following error:
Invalid argument: logits and labels must be broadcastable: logits_size=[19961875,21] labels_size=[2101250,21] [[node semantic_merged_logits/pixel_losses (defined at ..../utils/train_utils.py:151) ]] Finally, I set --decoder_filter=21 and the training went smoothly.
Thus, for any one interested to train mnv3 on a custom dataset, these are tested settings (in addition to above mentioned): --initialize_last_layer=false --last_layers_contain_logits_only=False --decoder_filter=number of classes
I have a question. when use, "initialize_last_layer=false --last_layers_contain_logits_only=False --decoder_filter=number of classes is it use pretrained weight ? or start with scratch?
deeplab/ckpt/deeplab_mnv3_small_cityscapes_trainfine/model.ckpt Traceback (most recent call last): File "/Users//DeepLearning/ml/hand/deeplab/fingertip_seg_train/deeplab/train.py", line 464, in
tf.app.run()
File "/Users//anaconda3/envs/mpy/lib/python3.6/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/Users//anaconda3/envs/mpy/lib/python3.6/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/Users//anaconda3/envs/mpy/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "/Users//DeepLearning/ml/hand/deeplab/fingertip_seg_train/deeplab/train.py", line 444, in main
ignore_missing_vars=True)
File "/Users//DeepLearning/ml/hand/deeplab/fingertip_seg_train/deeplab/utils/train_utils.py", line 221, in get_model_init_fn
ignore_missing_vars=ignore_missing_vars)
File "/Users//anaconda3/envs/mpy/lib/python3.6/site-packages/tensorflow_core/contrib/framework/python/ops/variables.py", line 690, in assign_from_checkpoint
(ckpt_name, str(ckpt_value.shape), str(var.get_shape())))
ValueError: Total size of new array must be unchanged for image_pooling/weights lh_shape: [(1, 1, 288, 128)], rh_shape: [(1, 1, 288, 256)]