yangyanli / PointCNN

PointCNN: Convolution On X-Transformed Points (NeurIPS 2018)
https://arxiv.org/abs/1801.07791
Other
1.36k stars 365 forks source link

tf_sampling_so.so error while training #223

Closed NiranjanRavi1993 closed 4 years ago

NiranjanRavi1993 commented 4 years ago

Hi, I followed the steps in the Semantic3d dataset and used a custom dataset to train. I was able to create .h5 and all steps were successful. But when I run, ./train_val_semantic3d.sh -g 0 -x semantic3d_x4_2048_fps :
inside models/seg -> the log file shows the following error: tf_sampling_so.so: cannot open shared object file: No such file or directory

I checked the existing issues (https://github.com/charlesq34/pointnet2/issues/48) and made changes to Pointcnn/sampling/tf_sampling_compiler.sh but still did not work.

I am using the TensorFlow version = 1.15, python 3.6, conda environment(Used pip command to install tf as mentioned in one of the issues. Still didn't work) Any help on how to resolve this issue? Regards Niranjan

sayakgis commented 4 years ago

@NiranjanRavi1993 : Please check this issue, i could compile using this step. I am not sure about 3.6 python as author has advised to downgrade to 3.5, and it worked for me.

https://github.com/yangyanli/PointCNN/issues/182

NiranjanRavi1993 commented 4 years ago

@sayakgis Hi, thank you for your reply. I tried the steps in the link you mentioned. Still, the same issue persists. Python version - 3.5.6 Tf version - 1.10.1 Cuda version - 9.0 Conda version - 4.8.3

I created new environments and tried above. Still did not work. Is there any wrong in the way I am trying? Or the combination of versions is an issue?

.sh script file:

/bin/bash

PYTHON=python3 CUDA_PATH=/usr/local/cuda TF_LIB=$($PYTHON -c 'import tensorflow as tf; print(tf.sysconfig.get_lib())')

PYTHON_VERSION=$($PYTHON -c 'import sys; print("%d.%d"%(sys.version_info[0], sys.version_info[1]))')

TF_PATH=$TF_LIB/include $CUDA_PATH/bin/nvcc tf_sampling_g.cu -o tf_sampling_g.cu.o -c -O2 -DGOOGLE_CUDA=1 -x cu -Xcompiler -fPIC g++ -std=c++11 tf_sampling.cpp tf_sampling_g.cu.o -o tf_sampling_so.so -shared -fPIC -L$TF_LIB -ltensorflow_framework -I $TF_PATH/external/nsync/public/ -I $TF_PATH -I $CUDA_PATH/include -lcudart -L $CUDA_PATH/lib64/ -O2

Thank you Regards Niranjan

sayakgis commented 4 years ago

Could you please elaborate on what error you are getting?

NiranjanRavi1993 commented 4 years ago

@sayakgis Under model/seg/pointcnn_seg_semantic_3d_x4_2048_fps.txt, below is what i keep getting:

/home/iot/anaconda3/envs/test/PointCNN/data_utils.py:162: H5pyDeprecationWarning: The default file mode will change to 'r' (read-only) in h5py 3.0. To suppress this warning, pass the mode you need to h5py.File(), or set the global default h5.get_config().default_file_mode, or set the environment variable H5PY_DEFAULT_READONLY=1. Available modes are: 'r', 'r+', 'w', 'w-'/'x', 'a'. See the docs for details. data = h5py.File(os.path.join(folder, line.strip())) Traceback (most recent call last): File "../train_val_seg.py", line 311, in main() File "../train_val_seg.py", line 136, in main net = model.Net(points_augmented, features_augmented, is_training, setting) File "/home/iot/anaconda3/envs/test/PointCNN/pointcnn_seg.py", line 11, in init PointCNN.init(self, points, features, is_training, setting) File "/home/iot/anaconda3/envs/test/PointCNN/pointcnn.py", line 64, in init from sampling import tf_sampling File "/home/iot/anaconda3/envs/test/PointCNN/sampling/tf_sampling.py", line 15, in sampling_module=tf.load_op_library(os.path.join(BASE_DIR, 'tf_sampling_so.so')) File "/home/iot/anaconda3/envs/test/lib/python3.5/site-packages/tensorflow/python/framework/load_library.py", line 56, in load_op_library lib_handle = py_tf.TF_LoadLibrary(library_filename) tensorflow.python.framework.errors_impl.NotFoundError: /home/iot/anaconda3/envs/test/PointCNN/sampling/tf_sampling_so.so: cannot open shared object file: No such file or directory

sayakgis commented 4 years ago

did the tf_compile.sh create tf_sampling_so.so? i can upload the so file if u need, it is on cuda-9.2.

NiranjanRavi1993 commented 4 years ago

Hi @sayakgis , I realized my mistake. I was able to generate tf_sampling_so.so and utilize it. Now training is successful to some part with my custom datasets. Thank you for your help.

sayakgis commented 4 years ago

Thanks for the update, which set-up/environment did work for you? Wanted to ask this for information of broader audience.

NiranjanRavi1993 commented 4 years ago

Yes, Python 3.5 Cuda - 9.0 GCC - 5.5 Tf - 1.10.1 Conda environment - 4.8.3

This is the setup I had. All I had to do was, run tf_compile.sh and start training my model. It worked perfectly fine.