tensorflow / models

Models and examples built with TensorFlow
Other
77.03k stars 45.77k forks source link

[deeplab] InvalidArgumentError, Restoring from checkpoint failed. Assign requires shapes of both tensors to match. #5390

Closed chenmengyang closed 6 years ago

chenmengyang commented 6 years ago

Hi, I got an error when trying to run the train.py on cityscapes dataset Here is the code:

python deeplab/train.py \
    --logtostderr \
    --training_number_of_steps=1 \
    --train_split="train" \
    --model_variant="xception_65" \
    --atrous_rates=6 \
    --atrous_rates=12 \
    --atrous_rates=18 \
    --output_stride=16 \
    --decoder_output_stride=4 \
    --train_crop_size=769 \
    --train_crop_size=769 \
    --train_batch_size=1 \
    --dataset="cityscapes" \
    --tf_initial_checkpoint="/home/username/tf/models/research/deeplab/datasets/cityscapes/init_models/deeplabv3_pascal_train_aug/model.ckpt" \
    --train_logdir="/home/username/tf/models/research/deeplab/datasets/cityscapes/exp/train_on_trainval_set/train"\
    --dataset_dir="/home/username/tf/models/research/deeplab/datasets/cityscapes/tfrecord"

The error message is:

InvalidArgumentError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Assign requires shapes of both tensors to match. lhs shape= [1,1,256,19] rhs shape= [1,1,256,21]
     [[Node: save/Assign_115 = Assign[T=DT_FLOAT, _class=["loc:@logits/semantic/weights"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](logits/semantic/weights/Momentum, save/RestoreV2:115)]]

I was using the xception_65 init checkpoint and I set the right train_logdir argument, when I switch to use mobilenet_v2 init checkpoint I still got the same error message.

This error not showed when I was training on the voc_2012 dataset, but for cityscrapes and ade20k it always get error.

xianshunw commented 6 years ago

how did you solve this problem?

FranklynJey commented 5 years ago

This might be too late to answer the original question, but it might help others. I received a similar error by trying to load a Cityscapes checkpoint without having --dataset="cityscapes" specified! Error:

Assign requires shapes of both tensors to match. lhs shape= [21] rhs shape= [19])

This is not surprising, as the default dataset specification is _pascal_vocseg with 21 classes and I provided a pretrained Cityscapes checkpoint with a 19-class decoder. Just add --dataset="cityscapes" did the trick for me.

Make sure the used init-checkpoint fits your dataset specification. Cheers ;)

wnov commented 5 years ago

check if your class num in your config file match the checkpoint, and you can try to clear the checkoint file in your train_dir.

karanchahal commented 5 years ago

What do I do when my checkpoint doesn't have the same number of classes ? I am trying to train on my own dataset and want to use the mobile net pretrained checkpoint. Is that not possible as the mobile net version was trained on cityscapes which is 19 classes, and my training dataset is just 2 classes. Do iu have to train from scratch ?

FranklynJey commented 5 years ago

Hi @karanchahal, Take a look on #3730 (comment)

No you don't have to train it from scratch. If you have your own dataset but want to reuse the pre-trained feature encoder (also called backbone), try adding --initialize_last_layer=False --last_layers_contain_logits_only=False

Hope this helps, cheers ;)

AkhilaPerumalla123 commented 5 years ago

@FranklynJey I am new to deep learning. Can you please let me know where to add --initialize_last_layer = False --last_layers_contain_logits_only = False If I add in config file it is throwning below error. File "/home/NiftyNet/niftynet/init.py", line 66, in main system_param, input_data_param = user_parameters_parser.run() File "/home/NiftyNet/niftynet/utilities/user_parameters_parser.py", line 150, in run _check_config_file_keywords(config) File "/home/NiftyNet/niftynet/utilities/user_parameters_parser.py", line 280, in _check_config_file_keywords _raises_bad_keys(config_keywords, error_info='config file') File "/home/NiftyNet/niftynet/utilities/user_parameters_parser.py", line 315, in _raises_bad_keys key, closest, EPILOG_STRING, error_info)) ValueError: Unknown keywords in config file: By "initialize_last_layer" did you mean "antialiasing"? "initialize_last_layer" is not a valid option.

FranklynJey commented 5 years ago

Can you post the whole config file ?

AkhilaPerumalla123 commented 5 years ago

[promise12] path_to_search = /home/Container_data/Nifti/Images_nii filename_contains = nii spatial_window_size = (64, 64, 1) interp_order = 3 axcodes=(A, R, S)

[label] path_to_search = /home/Container_data/Nifti/Annotations_colored_nii filename_contains = nii spatial_window_size = (64, 64, 1) interp_order = 0 axcodes=(A, R, S)

############################## system configuration sections [SYSTEM] cuda_devices = "" num_threads = 2 num_gpus = 4 model_dir = ./dense_vnet_abdominal_ct

[NETWORK] name = dense_vnet activation_function = prelu batch_size = 1

volume level preprocessing

volume_padding_size = 0

histogram normalisation

histogram_ref_file = standardisation_models.txt norm_type = percentile cutoff = (0.01, 0.99) normalisation = True whitening = True normalise_foreground_only=True foreground_type = otsu_plus multimod_foreground_type = and window_sampling = resize

queue_length = 8

[TRAINING] sample_per_volume = 4

rotation_angle = (-10.0, 10.0)

scaling_percentage = (-10.0, 10.0)

random_flipping_axes= 1

lr = 0.0002 loss_type = Dice starting_iter = -1 save_every_n = 1250 max_iter = 25000 max_checkpoints = 20

############################ custom configuration sections [SEGMENTATION] image = promise12 label = label output_prob = False num_classes = 256 label_normalisation = True min_numb_labels = 2 min_sampling_ratio = 0.0001

FranklynJey commented 5 years ago

So, the suggestions made above are for the deeplab model. To be more specific you add it to the bash scripts within the deeplab framework. I think you are using another model and the file you posted is not a bash script.

You might find help by looking for threads addressing your specific model. I wish you good luck! Best, Frank

ramesh8v commented 5 years ago

@FranklynJey and others: I tried, still I'm getting the error: Assign requires shapes of both tensors to match. lhs shape= [4] rhs shape= [21]

My dataset contains 4 classes including the background. My training parameters exactly look like this:

NUM_ITERATIONS=1000
python "${WORK_DIR}"/train.py \
  --logtostderr \
  --train_split="train" \
  --model_variant="xception_65" \
  --dataset="myData" \
  --atrous_rates=6 \
  --atrous_rates=12 \
  --atrous_rates=18 \
  --output_stride=16 \
  --decoder_output_stride=4 \
  --train_crop_size=513 \
  --train_crop_size=513 \
  --train_batch_size=4 \
  --initialize_last_layer = False \
  --last_layers_contain_logits_only = True   \  ##also tried False
  --training_number_of_steps="${NUM_ITERATIONS}" \
  --tf_initial_checkpoint="${INIT_FOLDER}/model.ckpt" \
  --train_logdir="${TRAIN_LOGDIR}" \
  --dataset_dir="${DATASET}" 

Any thoughts?

FranklynJey commented 5 years ago

First, try to remove the whitespaces:

--initialize_last_layer=False  \
--last_layers_contain_logits_only=False  \

Does it fix your issue? If not make sure the segmentation_dataset.py is configured correctly

ramesh8v commented 5 years ago

Yes, removing whitespaces helped me, thanks!

zheyuanWang commented 4 years ago

quote from the code:

#Set to False if one does not want to re-use the trained classifier weights.

flags.DEFINE_boolean('initialize_last_layer', True,
                     'Initialize the last layer.')

flags.DEFINE_boolean('last_layers_contain_logits_only', False,
                     'Only consider logits as last layers or not.')

the flag of intialize_last_layer has the function of excluding the last layer in the network:

# Variables that will not be restored.
  exclude_list = ['global_step']
  if not initialize_last_layer:
    exclude_list.extend(last_layers)

the second Flag mentioned above is set to False by default.

flags.DEFINE_boolean('last_layers_contain_logits_only', False,
                     'Only consider logits as last layers or not.')