mauriceqch / pcc_geo_cnn

Learning Convolutional Transforms for Point Cloud Geometry Compression
MIT License
46 stars 12 forks source link

InvalidArgumentError: Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. #9

Open yansir-X opened 4 years ago

yansir-X commented 4 years ago

Dear Mr. Quach, I made some changes to synthesis_transform and analyesis_transform in and retrained the model. But when I run the using the newly retrained model, the following error occurs:

_InvalidArgumentError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Assign requires shapes of both tensors to match. lhs shape= [128] rhs shape= [32] [[node save/Assign (defined at /home/stud_yang/.conda/envs/myenv/lib/python3.6/site-packages/tensorflowestimator/python/estimator/ ]]

I guess somehow the old model gets involved? I've spent quite some time searching for a solution and trying, with no progress. Maybe you have any ideas?

Sorry to always bother you. :) Best

mauriceqch commented 4 years ago

Hello @Yangsir-X,

By any chance, did you train your model with the --num_filters option ?

If so, you also need to add the same --num_filters option for compression and decompression.


yansir-X commented 4 years ago

Thanks for answering. no, I didn't use the --num_filters option. I searched online and it points me to sth like: tf.reset_default_graph() But I don't know where I should add that. Or if I should do sth else.

I'm really stuck at this problem. Best

mauriceqch commented 4 years ago

I am not sure this would solve your problem.

From the error, it seems that the number of filters in the trained model is different in training and compression. [128] and [32] seems like a shape mismatch for a bias variable. Did you increase the number of filters in ?

Also, maybe you are still using the path to the old model instead of the new one when using ?

Can you give me the commands you used before encountering this error ? Also, what changes did you make to the model ?

yansir-X commented 4 years ago

The only thing where i changed sth is in analysis_transform and synthelsis_transform, where i added batchnormlization and another conv and deconv layer:

def analysis_transform(tensor, num_filters, data_format): with tf.variable_scope("analysis"): with tf.variable_scope("layer_0"): layer = tf.layers.Conv3D( num_filters, (9, 9, 9), strides=(2, 2, 2), padding="same", use_bias=True, activation=tf.nn.relu, data_format=data_format) tensor = layer(tensor)

    **with tf.variable_scope("bnorm"):
        layer = tf.layers.BatchNormalization(axis=2, momentum=0.99, epsilon=0.001, center=True, scale=True,
                beta_initializer='zeros', gamma_initializer='ones',
                moving_mean_initializer='zeros', moving_variance_initializer='ones',
                beta_regularizer=None, gamma_regularizer=None, beta_constraint=None,
                gamma_constraint=None, renorm=False, renorm_clipping=None, renorm_momentum=0.99,
                fused=None, trainable=True, virtual_batch_size=None, adjustment=None, name=None)
        tensor = layer(tensor)**

    with tf.variable_scope("layer_1"):
        layer = tf.layers.Conv3D(
                num_filters, (5, 5, 5), strides=(2, 2, 2), padding="same",
                use_bias=True, activation=tf.nn.relu, data_format=data_format)
        tensor = layer(tensor)

    with tf.variable_scope("layer_2"):
        layer = tf.layers.Conv3D(
                num_filters, (5, 5, 5), strides=(2, 2, 2), padding="same",
                use_bias=False, activation=tf.nn.relu, data_format=data_format)
        tensor = layer(tensor)

    with tf.variable_scope("layer_3"):
        layer = tf.layers.Conv3D(
                num_filters, (5, 5, 5), strides=(2, 2, 2), padding="same",
                use_bias=False, activation=None, data_format=data_format)
        tensor = layer(tensor)

return tensor

def synthesis_transform(tensor, num_filters, data_format): with tf.variable_scope("synthesis"): with tf.variable_scope("layer_0"): layer = tf.layers.Conv3DTranspose( num_filters, (5, 5, 5), strides=(2, 2, 2), padding="same", use_bias=True, activation=tf.nn.relu, data_format=data_format) tensor = layer(tensor)

    with tf.variable_scope("layer_1"):
        layer = tf.layers.Conv3DTranspose(
                num_filters, (5, 5, 5), strides=(2, 2, 2), padding="same",
                use_bias=True, activation=tf.nn.relu, data_format=data_format)
        tensor = layer(tensor)

    with tf.variable_scope("layer_2"):
        layer = tf.layers.Conv3DTranspose(
                num_filters, (5, 5, 5), strides=(2, 2, 2), padding="same",
                use_bias=True, activation=tf.nn.relu, data_format=data_format)
        tensor = layer(tensor)

    with tf.variable_scope("layer_3"):
        layer = tf.layers.Conv3DTranspose(
                1, (9, 9, 9), strides=(2, 2, 2), padding="same",
                use_bias=True, activation=tf.nn.relu, data_format=data_format)
        tensor = layer(tensor)          
return tensor
mauriceqch commented 4 years ago

I think the issue is that axis should be 1 (the channels axis) instead of 2 since the model is in channels_first mode.

yansir-X commented 4 years ago

Also, maybe you are still using the path to the old model instead of the new one when using ? I am using the new model for

Can you give me the commands you used before encountering this error ? What do you mean? It's just: python "../data/ModelNet40_pc_64/*/.ply" ../models/Model256_new --resolution 64 --lmbda 0.000001 python ../data/m40/ "*/.ply" ../data/msft_bin_256new ../models/Model256_new --resolution 256

mauriceqch commented 4 years ago

Also, maybe you are still using the path to the old model instead of the new one when using ? I am using the new model for

Can you give me the commands you used before encountering this error ? What do you mean? It's just: python "../data/ModelNet40_pc_64/*/.ply" ../models/Model256_new --resolution 64 --lmbda 0.000001 python ../data/m40/ "*/.ply" ../data/msft_bin_256new ../modelss/Model256_new --resolution 256

Ok, so I am pretty sure that the axis is the issue. After the first convolution, 256 / 2 = 128 (256 the compress resolution) is in conflict with 64 / 2 = 32 (64 the training resolution). BatchNorm should be on axis=1 (channels axis) as the model is in channels_first mode, not on axis=2 (a spatial axis).

yansir-X commented 4 years ago

You are right, Mr. Quach! Thanks for your help!

mauriceqch commented 4 years ago

No problem, happy to help!