sampepose / flownet2-tf

FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks
MIT License
404 stars 195 forks source link

Issue compiling on Ubuntu and tensorflow 1.4 #28

Open clausmichele opened 6 years ago

clausmichele commented 6 years ago

Dear all,

I have an issue trying to compile the code with tensorflow 1.4. I already solve the problem of cuda_config.h missing, looking at a solved issue. Here is the output of make all:

make all nvcc -g -std=c++11 -Ipython -c "import tensorflow; print(tensorflow.sysconfig.get_include())"` -I"/usr/local/cuda/include" -DGOOGLE_CUDA=1 -D_MWAITXINTRIN_H_INCLUDED -D_FORCE_INLINES -D__STRICT_ANSI -D_GLIBCXX_USE_CXX11_ABI=0 -c -gencode=arch=compute_30,code=sm_30 src/ops/preprocessing/kernels/data_augmentation.cu.cc -x cu -Xcompiler -fPIC -o src/ops/build/data_augmentation.o /usr/local/lib/python2.7/dist-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1265): warning: calling a constexpr host function("real") from a host device__ function("abs") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/usr/local/lib/python2.7/dist-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1265): warning: calling a constexpr host function("imag") from a host device function("abs") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/usr/local/lib/python2.7/dist-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1265): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/usr/local/lib/python2.7/dist-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1265): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/usr/local/lib/python2.7/dist-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1270): warning: calling a constexpr host function("real") from a host device function("abs") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/usr/local/lib/python2.7/dist-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1270): warning: calling a constexpr host function("imag") from a host device function("abs") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/usr/local/lib/python2.7/dist-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1270): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/usr/local/lib/python2.7/dist-packages/tensorflow/include/unsupported/Eigen/CXX11/../../../Eigen/src/Core/MathFunctions.h(1270): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/usr/local/lib/python2.7/dist-packages/tensorflow/include/unsupported/Eigen/CXX11/src/Tensor/TensorRandom.h(133): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/usr/local/lib/python2.7/dist-packages/tensorflow/include/unsupported/Eigen/CXX11/src/Tensor/TensorRandom.h(138): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/usr/local/lib/python2.7/dist-packages/tensorflow/include/unsupported/Eigen/CXX11/src/Tensor/TensorRandom.h(208): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/usr/local/lib/python2.7/dist-packages/tensorflow/include/unsupported/Eigen/CXX11/src/Tensor/TensorRandom.h(213): warning: calling a constexpr host function from a host device function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/usr/local/lib/python2.7/dist-packages/tensorflow/include/google/protobuf/arena_impl.h(52): warning: integer conversion resulted in a change of sign

/usr/local/lib/python2.7/dist-packages/tensorflow/include/google/protobuf/arena_impl.h(147): warning: integer conversion resulted in a change of sign

/usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/util/cuda_kernel_helper.h(572): error: calling a constexpr host function("real") from a device function("CudaAtomicSub") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/util/cuda_kernel_helper.h(572): error: calling a constexpr host function("imag") from a device function("CudaAtomicSub") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/util/cuda_kernel_helper.h(577): error: calling a constexpr host function("real") from a device function("CudaAtomicSub") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/usr/local/lib/python2.7/dist-packages/tensorflow/include/tensorflow/core/util/cuda_kernel_helper.h(577): error: calling a constexpr host function("imag") from a device function("CudaAtomicSub") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

4 errors detected in the compilation of "/tmp/tmpxft_000009d2_00000000-7_data_augmentation.cu.cpp1.ii". Makefile:63: recipe for target 'preprocessing' failed make: *** [preprocessing] Error 2`

zhouqixian commented 6 years ago

Add a flag '--expt-relaxed-constexpr' when compiling with nvcc.

here is my Makefile for tf1.4:

OUT_DIR = ./build

TF_INC = $(shell python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())') TF_LIB = $(shell python -c 'import tensorflow as tf; print(tf.sysconfig.get_lib())') TF_NSYNC = $(TF_INC)/external/nsync/public CUDA_HOME = /usr/local/cuda

GPUFLAGS = -I $(TF_INC) -I$(TF_NSYNC) -I$(CUDA_HOME)/include -I/usr/local -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC CFLAGS = -I $(TF_INC) -I$(TF_NSYNC) -fPIC -L$(CUDA_HOME)/lib -L$(CUDA_HOME)/lib64 -lcudart -L$(TF_LIB) -ltensorflow_framework

all: downsample.so flow_warp.so preprocessing.so correlation.so

downsample_kernel_gpu.o: nvcc -std=c++11 -c --expt-relaxed-constexpr --gpu-architecture=sm_52 \ -o $(OUT_DIR)/downsample_kernel_gpu.o downsample/downsample_kernel_gpu.cu.cc \ $(GPUFLAGS) downsample.so: downsample_kernel_gpu.o g++ -std=c++11 -shared \ -o $(OUT_DIR)/downsample.so \ downsample/downsample_kernel.cc downsample/downsample_op.cc \ $(OUT_DIR)/downsample_kernel_gpu.o \ $(CFLAGS) flow_warp_gpu.o: nvcc -std=c++11 -c --expt-relaxed-constexpr --gpu-architecture=sm_52 \ -o $(OUT_DIR)/flow_warp_gpu.o flow_warp/flow_warp.cu.cc \ $(GPUFLAGS) flow_warp_grad_gpu.o: nvcc -std=c++11 -c --expt-relaxed-constexpr --gpu-architecture=sm_52 \ -o $(OUT_DIR)/flow_warp_grad_gpu.o flow_warp/flow_warp_grad.cu.cc \ $(GPUFLAGS) flow_warp.so: flow_warp_gpu.o flow_warp_grad_gpu.o g++ -std=c++11 -shared \ -o $(OUT_DIR)/flow_warp.so \ flow_warp/flow_warp_op.cc flow_warp/flow_warp.cc flow_warp/flow_warp_grad.cc \ $(OUT_DIR)/flow_warp_gpu.o $(OUT_DIR)/flow_warp_grad_gpu.o \ $(CFLAGS)

data_augmentation.o: nvcc -std=c++11 -c --expt-relaxed-constexpr --gpu-architecture=sm_52 \ -o $(OUT_DIR)/data_augmentation.o preprocessing/kernels/data_augmentation.cu.cc \ $(GPUFLAGS) flow_augmentation_gpu.o: nvcc -std=c++11 -c --expt-relaxed-constexpr --gpu-architecture=sm_52 \ -o $(OUT_DIR)/flow_augmentation_gpu.o preprocessing/kernels/flow_augmentation_gpu.cu.cc \ $(GPUFLAGS) preprocessing.so: data_augmentation.o flow_augmentation_gpu.o g++ -std=c++11 -shared \ -o $(OUT_DIR)/preprocessing.so \ preprocessing/preprocessing.cc preprocessing/kernels/flow_augmentation.cc \ preprocessing/kernels/augmentation_base.cc preprocessing/kernels/data_augmentation.cc \ $(OUT_DIR)/data_augmentation.o $(OUT_DIR)/flow_augmentation_gpu.o \ $(CFLAGS)

correlation_kernel_gpu.o: nvcc -std=c++11 -c --expt-relaxed-constexpr --gpu-architecture=sm_52 \ -o $(OUT_DIR)/correlation_kernel_gpu.o correlation/correlation_kernel.cu.cc \ $(GPUFLAGS) correlation_grad_kernel_gpu.o: nvcc -std=c++11 -c --expt-relaxed-constexpr --gpu-architecture=sm_52 \ -o $(OUT_DIR)/correlation_grad_kernel_gpu.o correlation/correlation_grad_kernel.cu.cc \ $(GPUFLAGS) correlation_pad_gpu.o: nvcc -std=c++11 -c --expt-relaxed-constexpr --gpu-architecture=sm_52 \ -o $(OUT_DIR)/correlation_pad_gpu.o correlation/pad.cu.cc \ $(GPUFLAGS) correlation.so: correlation_kernel_gpu.o correlation_grad_kernel_gpu.o correlation_pad_gpu.o g++ -std=c++11 -shared \ -o $(OUT_DIR)/correlation.so \ correlation/correlation_kernel.cc correlation/correlation_grad_kernel.cc correlation/correlation_op.cc \ $(OUT_DIR)/correlation_kernel_gpu.o $(OUT_DIR)/correlation_grad_kernel_gpu.o $(OUT_DIR)/correlation_pad_gpu.o \ $(CFLAGS)

clean: rm -f $(OUT_DIR)/*

clarkren commented 6 years ago

@zhouqixian I used the" Makefile for tf1.4",but got error"Makefile:15: *** missing separator. Stop." I just don't know what's wrong.

oneTimePad commented 6 years ago

@clarkren That issue is occurring because you need to add tabs to the lines beneath the ones with the colon (for example donwsample_kernel_gpu.o: )

@zhouqixian Even with this Makefile I am still receiving the error ""__CUDACC_VER__" is no longer supported" I am using tf1.7 Cuda9.0. I read that this issue was supposedly fixed a while back with TF.

CQFIO commented 6 years ago

I finally make it work in TF 1.2 only. I could not make it run in TF 1.7

zhouqixian commented 6 years ago

@oneTimePad It works for me when using tf1.4, cuda8.0 and cudnnV6. Actually, NVCC is a Cuda compiler and error maybe occurs when you use a higher version Cuda(Cuda 9.0) for this code. If you only want to use flow-warping operation in tf. There is a code without any custom operation.

def get_pixel_value(img, x, y): """ Utility function to get pixel value for coordinate vectors x and y from a 4D tensor image. Input

- img: tensor of shape (B, H, W, C)
- x: flattened tensor of shape (B*H*W, )
- y: flattened tensor of shape (B*H*W, )
Returns
-------
- output: tensor of shape (B, H, W, C)
"""

shape = tf.shape(x)
batch_size = shape[0]
height = shape[1]
width = shape[2]

batch_idx = tf.range(0, batch_size)
batch_idx = tf.reshape(batch_idx, (batch_size, 1, 1))
b = tf.tile(batch_idx, (1, height, width))

indices = tf.stack([b, y, x], 3)

return tf.gather_nd(img, indices)

def tf_warp(img, flow, H, W): """ Input: img: [B, H, W, C] of float32 flow: [B, H, W, 2] of float32 """

flow = tf.transpose(flow, [0, 3, 1, 2])

x,y = tf.meshgrid(tf.range(W), tf.range(H))
x = tf.expand_dims(x,0)
x = tf.expand_dims(x,0)

y  =tf.expand_dims(y,0)
y = tf.expand_dims(y,0)

x = tf.cast(x, tf.float32)
y = tf.cast(y, tf.float32)
grid  = tf.concat([x,y],axis = 1)

flows = grid+flow
max_y = tf.cast(H - 1, tf.int32)
max_x = tf.cast(W - 1, tf.int32)
zero = tf.zeros([], dtype=tf.int32)

x = flows[:,0,:,:]
y = flows[:,1,:,:]

x = tf.clip_by_value(x, tf.cast(zero, tf.float32), tf.cast(max_x, tf.float32))
y = tf.clip_by_value(y, tf.cast(zero, tf.float32), tf.cast(max_y, tf.float32))

x0 = x
y0 = y
x0 = tf.cast(x0, tf.int32)
x1 = x0 + 1
y0 = tf.cast(y0,  tf.int32)
y1 = y0 + 1

# clip to range [0, H/W] to not violate img boundaries
x0 = tf.clip_by_value(x0, zero, max_x)
x1 = tf.clip_by_value(x1, zero, max_x)
y0 = tf.clip_by_value(y0, zero, max_y)
y1 = tf.clip_by_value(y1, zero, max_y)

# get pixel value at corner coords
Ia = get_pixel_value(img, x0, y0)
Ib = get_pixel_value(img, x0, y1)
Ic = get_pixel_value(img, x1, y0)
Id = get_pixel_value(img, x1, y1)

# recast as float for delta calculation
x0 = tf.cast(x0, tf.float32)
x1 = tf.cast(x1, tf.float32)
y0 = tf.cast(y0, tf.float32)
y1 = tf.cast(y1, tf.float32)

# calculate deltas
wa = (x1-x) * (y1-y)
wb = (x1-x) * (y-y0)
wc = (x-x0) * (y1-y)
wd = (x-x0) * (y-y0)

# add dimension for addition
wa = tf.expand_dims(wa, axis=3)
wb = tf.expand_dims(wb, axis=3)
wc = tf.expand_dims(wc, axis=3)
wd = tf.expand_dims(wd, axis=3)

# compute output
out = tf.add_n([wa*Ia, wb*Ib, wc*Ic, wd*Id])
return out
shoutashi commented 6 years ago

Hi @zhouqixian, I have tried to use your Makefile (tf 1.4.1, cuda 8.0, cudnn v6 and python 3.5), Compile is successful, but in testing, I face the problem "correlation.so: undefined symbol:_ZTIN10tensorflow8OpKernelE", could you help me, please? Thank you in advance.

alisaaalehi commented 6 years ago

I'm getting the same error as @shoutashi : "correlation.so: undefined symbol:_ZTIN10tensorflow8OpKernelE". Could you please help me with this?

DehaiZhao commented 6 years ago

Hi, @shoutashi, @alisaaalehi , I'm facing the same problem undefined symbol: _ZTIN10tensorflow8OpKernelE Have you solved it? Thanks

alisaaalehi commented 6 years ago

Hey @dehaisea, in my case removing -D_GLIBCXX_USE_CXX11_ABI=0 from the Makefile and rebuilding the project fixed it.

Iamanorange commented 6 years ago

@shoutashi @alisaaalehi @dehaisea For error "correlation.so: undefined symbol:_ZTIN10tensorflow8OpKernelE": Modify Makefile: TF_LIB = `python -c "import tensorflow; print(tensorflow.sysconfig.get_lib())"` CGPUFLAGS = -L$(CUDA_HOME)/lib -L$(CUDA_HOME)/lib64 -lcudart -L$(TF_LIB) -ltensorflow_framework

If tensorflow.sysconfig.get_lib() cannot get correct dir, manually link: CGPUFLAGS = -L$(CUDA_HOME)/lib -L$(CUDA_HOME)/lib64 -lcudart -L /home//.local/lib/python2.7/site-packages/tensorflow -ltensorflow_framework`

Env: cuda 9.2 cudnn 7.1 tensorflow 1.9.0 Ubuntu 16.04

For more information: https://github.com/sampepose/flownet2-tf/issues/41 https://github.com/tensorflow/tensorflow/issues/13607

Here is my Makefile. I made some other changes to make it successfully. Makefile.txt

SHENG-KAI-HUANG commented 6 years ago

Hello @Iamanorange, I have tried your Makefile in Ubuntu 16.04, tensorflow 1.10.1, cuda 9.0, cudnn 7.3 and it work well! Thanks your help!

aa-samad commented 5 years ago

i was facing the same issues: Env: tensorflow 1.11 - cuda 9.0 - python 2.7 - ubuntu 16.04

story:

1- removing -D_GLIBCXX_USE_CXX11_ABI=0 can only work for gcc < 5.0.0 refrence: mgharbi/hdrnet_legacy#2 2- the makefile that @lamanorange provided did not work for me (compile error on correlation.so) solutions provided over internet: remove -D GOOGLE_CUDA=1 -> successful compile but that strange undefined symbol: _ZTIN10tensorflow8OpKernelE error just like #41 3- turns out this compiling is incomplete with python2.7 and tensorflow1.11 (i don`t know the issue yet!)

solution:

switch to python 3 :smile: 1- change python in makefile provided by @lamanorange to python3 2- compile 3- edit src/flowlib.py: add from future import print_function to the begining change all print ... to print (...)

this worked for me!

BibratRanjan commented 5 years ago

After

@shoutashi @alisaaalehi @dehaisea For error "correlation.so: undefined symbol:_ZTIN10tensorflow8OpKernelE": Modify Makefile: TF_LIB = python -c "import tensorflow; print(tensorflow.sysconfig.get_lib())" CGPUFLAGS = -L$(CUDA_HOME)/lib -L$(CUDA_HOME)/lib64 -lcudart -L$(TF_LIB) -ltensorflow_framework

If tensorflow.sysconfig.get_lib() cannot get correct dir, manually link: CGPUFLAGS = -L$(CUDA_HOME)/lib -L$(CUDA_HOME)/lib64 -lcudart -L /home//.local/lib/python2.7/site-packages/tensorflow -ltensorflow_framework`

Env: cuda 9.2 cudnn 7.1 tensorflow 1.9.0 Ubuntu 16.04

For more information:

41

tensorflow/tensorflow#13607

Here is my Makefile. I made some other changes to make it successfully. Makefile.txt

After modifying the Makefile, I ran into another "undefined symbol" problem : tensorflow.python.framework.errors_impl.NotFoundError: /media/cds-iisc/DATA/Undertaker/ANT/testing/flownet2-tf-master/src/./ops/build/correlation.so: undefined symbol: _ZN10tensorflow8internal21CheckOpMessageBuilder9NewStringEv

Any help ?

BibratRanjan commented 5 years ago

I'am using Ubuntu 16.04, tensorflow 1.10.1, cuda 9.0, cudnn 7

BibratRanjan commented 5 years ago

and python 3.6.4

BibratRanjan commented 5 years ago

removing -D_GLIBCXX_USE_CXX11_ABI=0 solved the issue

alisaaalehi commented 5 years ago

Hi @BibratRanjan, anything that you change will cause another problem. I've encountered lots of problems and my final and simple solution is this:

fjchange commented 5 years ago

@zhouqixian

That work! Thanks but that can be simplified . Just add "--expt-relaxed-constexpr" at the end of line 11 of origin Makefile

mengyaaa commented 5 years ago

@shoutashi @alisaaalehi @dehaisea For error "correlation.so: undefined symbol:_ZTIN10tensorflow8OpKernelE": Modify Makefile: TF_LIB = python -c "import tensorflow; print(tensorflow.sysconfig.get_lib())" CGPUFLAGS = -L$(CUDA_HOME)/lib -L$(CUDA_HOME)/lib64 -lcudart -L$(TF_LIB) -ltensorflow_framework

If tensorflow.sysconfig.get_lib() cannot get correct dir, manually link: CGPUFLAGS = -L$(CUDA_HOME)/lib -L$(CUDA_HOME)/lib64 -lcudart -L /home//.local/lib/python2.7/site-packages/tensorflow -ltensorflow_framework`

Env: cuda 9.2 cudnn 7.1 tensorflow 1.9.0 Ubuntu 16.04

For more information:

41

tensorflow/tensorflow#13607

Here is my Makefile. I made some other changes to make it successfully. Makefile.txt

Thank you so much for you Makeflie. After make all success. I ran python -m src.flownet2.test --input_a data/samples/0img0.ppm --input_b data/samples/0img1.ppm --out ./ It shows WARNING:tensorflow:From /home/lab226/Downloads/flownet2-tf-master/src/net.py:22: get_or_create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.get_or_create_global_step WARNING:tensorflow:From /home/lab226/Downloads/flownet2-tf-master/src/flownet_cs/flownet_cs.py:26: calling reduce_sum (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version. Instructions for updating: keep_dims is deprecated, use keepdims instead 2018-12-18 11:25:54.254685: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA Traceback (most recent call last): File "/home/lab226/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1278, in _do_call return fn(*args) File "/home/lab226/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1261, in _run_fn self._extend_graph() File "/home/lab226/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1295, in _extend_graph tf_session.ExtendSession(self._session) tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'Correlation' with these attrs. Registered devices: [CPU], Registered kernels: device='GPU'

 [[Node: FlowNet2/FlowNetCSS/FlowNetCS/FlowNetC/Correlation = Correlation[kernel_size=1, max_displacement=20, pad=20, stride_1=1, stride_2=2](FlowNet2/FlowNetCSS/FlowNetCS/FlowNetC/conv3/lrelu/add, FlowNet2/FlowNetCSS/FlowNetCS/FlowNetC/conv3_1/lrelu/add)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/lab226/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1725, in restore {self.saver_def.filename_tensor_name: save_path}) File "/home/lab226/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 877, in run run_metadata_ptr) File "/home/lab226/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1100, in _run feed_dict_tensor, options, run_metadata) File "/home/lab226/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1272, in _do_run run_metadata) File "/home/lab226/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1291, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'Correlation' with these attrs. Registered devices: [CPU], Registered kernels: device='GPU'

 [[Node: FlowNet2/FlowNetCSS/FlowNetCS/FlowNetC/Correlation = Correlation[kernel_size=1, max_displacement=20, pad=20, stride_1=1, stride_2=2](FlowNet2/FlowNetCSS/FlowNetCS/FlowNetC/conv3/lrelu/add, FlowNet2/FlowNetCSS/FlowNetCS/FlowNetC/conv3_1/lrelu/add)]]

Caused by op 'FlowNet2/FlowNetCSS/FlowNetCS/FlowNetC/Correlation', defined at: File "/home/lab226/anaconda3/envs/tf/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/lab226/anaconda3/envs/tf/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/lab226/Downloads/flownet2-tf-master/src/flownet2/test.py", line 51, in main() File "/home/lab226/Downloads/flownet2-tf-master/src/flownet2/test.py", line 18, in main out_path=FLAGS.out, File "/home/lab226/Downloads/flownet2-tf-master/src/net.py", line 62, in test predictions = self.model(inputs, training_schedule) File "/home/lab226/Downloads/flownet2-tf-master/src/flownet2/flownet2.py", line 22, in model net_css_predictions = self.net_css.model(inputs, training_schedule, trainable=False) File "/home/lab226/Downloads/flownet2-tf-master/src/flownet_css/flownet_css.py", line 18, in model net_cs_predictions = self.net_cs.model(inputs, training_schedule, trainable=False) File "/home/lab226/Downloads/flownet2-tf-master/src/flownet_cs/flownet_cs.py", line 18, in model net_c_predictions = self.net_c.model(inputs, training_schedule, trainable=False) File "/home/lab226/Downloads/flownet2-tf-master/src/flownet_c/flownet_c.py", line 40, in model cc = correlation(conv_a_3, conv_b_3, 1, 20, 1, 2, 20) File "/home/lab226/Downloads/flownet2-tf-master/src/correlation.py", line 14, in correlation padding) File "", line 53, in correlation File "/home/lab226/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/home/lab226/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 454, in new_func return func(*args, **kwargs) File "/home/lab226/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3155, in create_op op_def=op_def) File "/home/lab226/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1717, in init self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): No OpKernel was registered to support Op 'Correlation' with these attrs. Registered devices: [CPU], Registered kernels: device='GPU'

 [[Node: FlowNet2/FlowNetCSS/FlowNetCS/FlowNetC/Correlation = Correlation[kernel_size=1, max_displacement=20, pad=20, stride_1=1, stride_2=2](FlowNet2/FlowNetCSS/FlowNetCS/FlowNetC/conv3/lrelu/add, FlowNet2/FlowNetCSS/FlowNetCS/FlowNetC/conv3_1/lrelu/add)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/lab226/anaconda3/envs/tf/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/lab226/anaconda3/envs/tf/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/lab226/Downloads/flownet2-tf-master/src/flownet2/test.py", line 51, in main() File "/home/lab226/Downloads/flownet2-tf-master/src/flownet2/test.py", line 18, in main out_path=FLAGS.out, File "/home/lab226/Downloads/flownet2-tf-master/src/net.py", line 68, in test saver.restore(sess, checkpoint) File "/home/lab226/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1759, in restore err, "a mismatch between the current graph and the graph") tensorflow.python.framework.errors_impl.InvalidArgumentError: Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

No OpKernel was registered to support Op 'Correlation' with these attrs. Registered devices: [CPU], Registered kernels: device='GPU'

 [[Node: FlowNet2/FlowNetCSS/FlowNetCS/FlowNetC/Correlation = Correlation[kernel_size=1, max_displacement=20, pad=20, stride_1=1, stride_2=2](FlowNet2/FlowNetCSS/FlowNetCS/FlowNetC/conv3/lrelu/add, FlowNet2/FlowNetCSS/FlowNetCS/FlowNetC/conv3_1/lrelu/add)]]

Env : cuda 9.0 cudnn9.0 tensorflow 1.10.0 Ubuntu 16.04 Still can't run out the result...Can there any help? Thank you very much..

Iamanorange commented 5 years ago

@mengyaaa Should run Flownet2 on GPU only.

aminzabardast commented 5 years ago

@shoutashi @alisaaalehi @dehaisea For error "correlation.so: undefined symbol:_ZTIN10tensorflow8OpKernelE": Modify Makefile: TF_LIB = python -c "import tensorflow; print(tensorflow.sysconfig.get_lib())" CGPUFLAGS = -L$(CUDA_HOME)/lib -L$(CUDA_HOME)/lib64 -lcudart -L$(TF_LIB) -ltensorflow_framework

If tensorflow.sysconfig.get_lib() cannot get correct dir, manually link: CGPUFLAGS = -L$(CUDA_HOME)/lib -L$(CUDA_HOME)/lib64 -lcudart -L /home//.local/lib/python2.7/site-packages/tensorflow -ltensorflow_framework`

Env: cuda 9.2 cudnn 7.1 tensorflow 1.9.0 Ubuntu 16.04

For more information:

41

tensorflow/tensorflow#13607

Here is my Makefile. I made some other changes to make it successfully. Makefile.txt

Thank you! This solved my compiling issue on Tensorflow 1.10.0 with CUDA 9.0.

ZCMax commented 5 years ago

Hi @BibratRanjan, anything that you change will cause another problem. I've encountered lots of problems and my final and simple solution is this:

  • Use tensorflow 1.2.0-gpu to fix most of the problems: It is better to use Docker image of that version (tensorflow/tensorflow 1.2.1-gpu). It has everything needed to run this code. Since you are using gpu version of the tensorflow, remember to use nvidia-docker to create a container form the image.
  • You better update the g++ to version 4.8: apt-get install g++-4.8
  • and update the MakeFile to match this new version: change this CC = gcc -O2 -pthread to this CC = gcc-4.8 -O2 -pthread and this CXX = g++ to this one CXX = g++-4.8

I use the Docker image of that version( tensorflow/tensorflow: 1.2.1-gpu) and successfully run the makefile, but I got some problems when I install the python-tk in the Docker container, I try to use " apt-get install python-tk " , it tells me that I have to run " apt-get update" , but it always stop at " 0% working" when I run "apt-get update" , Did you encounter the similar problem?

Iamanorange commented 5 years ago

but I got some problems when I install the python-tk in the Docker container, I try to use " apt-get install python-tk "

TkInter (python-tk) is used to draw GUI. It is not necessary in this case. You can ignore it.

AloshkaD commented 4 years ago

Changing "-D_GLIBCXX_USE_CXX11_ABI=0" to "-D_GLIBCXX_USE_CXX11_ABI=1" worked for me.