optas / latent_3d_points

Auto-encoding & Generating 3D Point-Clouds.
Other
506 stars 109 forks source link

NameError: global name 'nn_distance' is not defined #21

Closed sohee-zoe closed 4 years ago

sohee-zoe commented 5 years ago
reset_tf_graph()
ae = PointNetAutoEncoder(conf.experiment_name, conf)

/home/~~/latent_3d_points/src/point_net_ae.py in _create_loss(self)

---> cost_p1p2, , cost_p2p1, = nn_distance(self.x_reconstr, self.gt) self.loss = tf.reduce_mean(cost_p1_p2) + tf.reduce_mean(cost_p2_p1) elif c.loss == 'emd': [ c.loss == 'chamfer'인 경우 :]

NameError: global name 'nn_distance' is not defined

lychan110 commented 5 years ago

I had the same issue and isolated the problem by creating a test file in the same directory as the losses, latent_3d_points/external/structural_losses:

import tensorflow as tf
nn_distance_module = tf.load_op_library('./tf_nndistance_so.so')

Running this gave the error: NotFoundError: undefined symbol: _ZTIN10tensorflow8OpKernelE

It seems to be essentially caused by my versions of CUDA and Tensorflow being newer than the one the author tested on (mine are 9.0 and 1.11.0), These helped me solve the problem: these answers, Tensorflow's guide, and this answer.

I could not use the solutions given in the above articles directly because for some reason my system (Linux Ubuntu 16.04 LTS) would give the error "No such file or directory" for ${TF_CFLAGS[@]} and ${TF_LFLAGS[@]}. It needs the space afer-L and -I. Here's how I fixed it:

First I checked the outputs of:

import tensorflow as tf
from __future__ import print_function
print(tf.sysconfig.get_compile_flags(),'\n')
print(tf.sysconfig.get_link_flags())

which for me was

['-I/usr/local/lib/python2.7/dist-packages/tensorflow/include', '-I/usr/local/lib/python2.7/dist-packages/tensorflow/include/external/nsync/public', '-D_GLIBCXX_USE_CXX11_ABI=1'] 

['-L/usr/local/lib/python2.7/dist-packages/tensorflow', '-ltensorflow_framework']

I manually replaced what was -I $(tensorflow) with the stuff from the first line, added the stuff from the second line to the g++ commands, and changed -D_GLIBCXX_USE_CXX11_ABI=0 to -D_GLIBCXX_USE_CXX11_ABI=1. My working makefile is:

nvcc=/usr/local/cuda-9.0/bin/nvcc
cudalib=/usr/local/cuda-9.0/lib64
nsync=/usr/local/lib/python2.7/dist-packages/tensorflow/include/external/nsync/public
TF_INC=/usr/local/lib/python2.7/dist-packages/tensorflow/include
TF_LIB=/usr/local/lib/python2.7/dist-packages/tensorflow/

all: tf_approxmatch_so.so tf_approxmatch_g.cu.o tf_nndistance_so.so tf_nndistance_g.cu.o

tf_approxmatch_so.so: tf_approxmatch_g.cu.o tf_approxmatch.cpp
    g++ -std=c++11 tf_approxmatch.cpp tf_approxmatch_g.cu.o -o tf_approxmatch_so.so -shared -fPIC -I $(TF_INC) -I $(nsync) -lcudart -L $(cudalib) -L $(TF_LIB) -ltensorflow_framework -O2 -D_GLIBCXX_USE_CXX11_ABI=1

tf_approxmatch_g.cu.o: tf_approxmatch_g.cu
    $(nvcc) -std=c++11 -c -o tf_approxmatch_g.cu.o tf_approxmatch_g.cu -I $(TF_INC) -I $(nsync) -DGOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -O2 -D_GLIBCXX_USE_CXX11_ABI=1

tf_nndistance_so.so: tf_nndistance_g.cu.o tf_nndistance.cpp
    g++ -std=c++11 tf_nndistance.cpp tf_nndistance_g.cu.o -o tf_nndistance_so.so -shared -fPIC -I $(TF_INC) -I $(nsync) -lcudart -L $(cudalib) -L $(TF_LIB) -ltensorflow_framework -O2 -D_GLIBCXX_USE_CXX11_ABI=1

tf_nndistance_g.cu.o: tf_nndistance_g.cu
    $(nvcc) -std=c++11 -c -o tf_nndistance_g.cu.o tf_nndistance_g.cu -I $(TF_INC) -I $(nsync) -DGOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -O2 -D_GLIBCXX_USE_CXX11_ABI=1

clean:
    rm tf_approxmatch_so.so
    rm tf_nndistance_so.so
    rm  *.cu.o 

After re-compiling the losses, everything works.