mauriceqch / pcc_geo_cnn

Learning Convolutional Transforms for Point Cloud Geometry Compression
MIT License
46 stars 12 forks source link

Conv3DBackpropInputOpV2 only supports NDHWC on the CPU. #4

Open xtorker opened 4 years ago

xtorker commented 4 years ago

Bugs_log:

InvalidArgumentError (see above for traceback): Conv3DBackpropInputOpV2 only supports NDHWC on the CPU. [[node synthesis/layer_0/conv3d_transpose/conv3d_transpose (defined at /home/chenghao/pcc_geo_cnn/src/compression_model.py:70) ]]


Hi, I follow the steps on the README to prepare the ModelNet40 dataset but face the problem above when I want to train your model.

Environments: Ubuntu 16.04 Python 3.6.5 Tensorflow 1.13

I have tried to modify the code to train the model on GPU. It seems all right when I compress/decompress the testing dataset of ModelNet40. But the decompressed point cloud went wrong when I test the model on the MVUB dataset. The decompressed point cloud is a cube full of points with 512512512 shape.

Maybe you can point out where did I do something wrong. Thanks in advance!

mauriceqch commented 4 years ago

Hi,

Thank you for your interest in our work.

Have you installed tensorflow using the tensorflow-gpu package ? In your error message, tensorflow mentions that it is using the CPU. The code is using data layouts optimized for the GPU (channels first).

You can double check your installation here: https://www.tensorflow.org/install/gpu

You can also check that a GPU is available using the following code:

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
xtorker commented 4 years ago

Yes. I have installed tensorflow-gpu and use it on other projects to train models.

I am confusing with the input_fn() in both compression_models.py and decompression.py You specify tensorflow to use cpu, other than gpu. Also, I don't find the code to specify gpu for tensorflow. The code below is what I mean.

'''Set GPU'''
gpus = [GPU_INDEX] # Here I set CUDA to only see one GPU
os.environ['CUDA_VISIBLE_DEVICES']=','.join([str(i) for i in gpus])
def train():
    with tf.Graph().as_default():
        with tf.device('/gpu:'+str(GPU_INDEX)):

Does I dig into wrong direction?

mauriceqch commented 4 years ago

To answer your questions, in input_fn() the dataset operations are placed on the CPU as this preprocesses the data (on the CPU) prior to loading it on the GPU. In decompression.py, there is a performance issue at high resolutions which is documented here https://github.com/tensorflow/tensorflow/issues/25760 .

If I understand correctly, you are encountering this error when decompressing point clouds. This is probably be cause of this line in decompress.py: os.environ['CUDA_VISIBLE_DEVICES'] = ''

On my configuration, I compiled tensorflow from source with Intel MKL support (https://www.tensorflow.org/install/source#configure_the_build) which enables channels first on the CPU. You could try one of these two:

xtorker commented 4 years ago

Thanks for your explanation about input_fn().

I have tried to comment out the line in decompress.py: os.environ['CUDA_VISIBLE_DEVICES'] = '' It indeed slows down the process significantly. But I will try other datasets with less resolution instead, so the influence seems okay to my project.

Thank you again for helping me find the solutions. :)

diksha7869 commented 2 years ago

InvalidArgumentError: Conv3DBackpropFilterOpV2 only supports NDHWC on the CPU. [[node gradient_tape/sequential_6/conv3d_16/Conv3D/Conv3DBackpropFilterV2 (defined at :23) ]] [Op:__inference_train_function_4107] i was running the code of 3dcnn and got this error model.add(Dense(nb_classes,init='normal')) model.add(Activation('softmax')) opt=adam(lr=0.0001)

Compile the model

model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

Spliting the dataset for testing and training

X_train_new, X_val_new, y_train_new,y_val_new = train_test_split(train_set, Y_train, test_size=0.2, random_state=4)

Training the model along with creating callbacks to tensorboard for graphical visualization of training process

tbcallback = keras.callbacks.TensorBoard(log_dir='/workspace/dgx1/keras/40g', histogram_freq=0, write_graph=True, write_images=True) hist = model.fit(X_train_new, y_train_new, validation_data=(X_val_new,y_val_new), batch_size=batch_size,nb_epoch = nb_epoch,shuffle=True,verbose=1,callbacks=[tbcallback])