Open xtorker opened 4 years ago
Hi,
Thank you for your interest in our work.
Have you installed tensorflow using the tensorflow-gpu package ? In your error message, tensorflow mentions that it is using the CPU. The code is using data layouts optimized for the GPU (channels first).
You can double check your installation here: https://www.tensorflow.org/install/gpu
You can also check that a GPU is available using the following code:
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
Yes. I have installed tensorflow-gpu and use it on other projects to train models.
I am confusing with the input_fn() in both compression_models.py and decompression.py You specify tensorflow to use cpu, other than gpu. Also, I don't find the code to specify gpu for tensorflow. The code below is what I mean.
'''Set GPU'''
gpus = [GPU_INDEX] # Here I set CUDA to only see one GPU
os.environ['CUDA_VISIBLE_DEVICES']=','.join([str(i) for i in gpus])
def train():
with tf.Graph().as_default():
with tf.device('/gpu:'+str(GPU_INDEX)):
Does I dig into wrong direction?
To answer your questions, in input_fn() the dataset operations are placed on the CPU as this preprocesses the data (on the CPU) prior to loading it on the GPU. In decompression.py, there is a performance issue at high resolutions which is documented here https://github.com/tensorflow/tensorflow/issues/25760 .
If I understand correctly, you are encountering this error when decompressing point clouds.
This is probably be cause of this line in decompress.py
:
os.environ['CUDA_VISIBLE_DEVICES'] = ''
On my configuration, I compiled tensorflow from source with Intel MKL support (https://www.tensorflow.org/install/source#configure_the_build) which enables channels first on the CPU. You could try one of these two:
Thanks for your explanation about input_fn().
I have tried to comment out the line in decompress.py
: os.environ['CUDA_VISIBLE_DEVICES'] = ''
It indeed slows down the process significantly.
But I will try other datasets with less resolution instead, so the influence seems okay to my project.
Thank you again for helping me find the solutions. :)
InvalidArgumentError: Conv3DBackpropFilterOpV2 only supports NDHWC on the CPU.
[[node gradient_tape/sequential_6/conv3d_16/Conv3D/Conv3DBackpropFilterV2 (defined at
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
X_train_new, X_val_new, y_train_new,y_val_new = train_test_split(train_set, Y_train, test_size=0.2, random_state=4)
tbcallback = keras.callbacks.TensorBoard(log_dir='/workspace/dgx1/keras/40g', histogram_freq=0, write_graph=True, write_images=True) hist = model.fit(X_train_new, y_train_new, validation_data=(X_val_new,y_val_new), batch_size=batch_size,nb_epoch = nb_epoch,shuffle=True,verbose=1,callbacks=[tbcallback])
Bugs_log:
InvalidArgumentError (see above for traceback): Conv3DBackpropInputOpV2 only supports NDHWC on the CPU. [[node synthesis/layer_0/conv3d_transpose/conv3d_transpose (defined at /home/chenghao/pcc_geo_cnn/src/compression_model.py:70) ]]
Hi, I follow the steps on the README to prepare the ModelNet40 dataset but face the problem above when I want to train your model.
Environments: Ubuntu 16.04 Python 3.6.5 Tensorflow 1.13
I have tried to modify the code to train the model on GPU. It seems all right when I compress/decompress the testing dataset of ModelNet40. But the decompressed point cloud went wrong when I test the model on the MVUB dataset. The decompressed point cloud is a cube full of points with 512512512 shape.
Maybe you can point out where did I do something wrong. Thanks in advance!