skyflynil / stylegan2

StyleGAN2 - Official TensorFlow Implementation with practical improvements
http://arxiv.org/abs/1912.04958
Other
120 stars 33 forks source link

tensorflow.python.framework.errors_impl.NotFoundError: /root/stylegan2_train/dnnlib/tflib/_cudacache/fused_bias_act_ec21d79f0dc288505704f796449a968e.so: undefined symbol: _ZN10tensorflow12OpDefBuilder6OutputESs #27

Closed Maglanyulan closed 3 years ago

Maglanyulan commented 3 years ago

When I run $ python run_training.py --num-gpus=1 --data-dir=/data/ --config=config-f --dataset=dataset --mirror-augment=true --metric=none --total-kimg=20000 --result-dir="~/chenyulan/data/results", I get an error as follow: Setting up TensorFlow plugin "fused_bias_act.cu": Preprocessing... Compiling... Loading... Failed! Traceback (most recent call last): File "run_training.py", line 230, in main() File "run_training.py", line 225, in main run(vars(args)) File "run_training.py", line 144, in run dnnlib.submit_run(kwargs) File "/root/stylegan2_train/dnnlib/submission/submit.py", line 343, in submit_run return farm.submit(submit_config, host_run_dir) File "/root/stylegan2_train/dnnlib/submission/internal/local.py", line 22, in submit return run_wrapper(submit_config) File "/root/stylegan2_train/dnnlib/submission/submit.py", line 280, in run_wrapper run_func_obj(submit_config.run_func_kwargs) File "/root/stylegan2_train/training/training_loop.py", line 179, in training_loop G = tflib.Network('G', num_channels=training_set.shape[0], resolution=training_set.shape[1], label_size=training_set.label_size, G_args) File "/root/stylegan2_train/dnnlib/tflib/network.py", line 97, in init self._init_graph() File "/root/stylegan2_train/dnnlib/tflib/network.py", line 154, in _init_graph out_expr = self._build_func(self.input_templates, build_kwargs) File "/root/stylegan2_train/training/networks_stylegan2.py", line 288, in G_main components.synthesis = tflib.Network('G_synthesis', func_name=globals()[synthesis_func], kwargs) File "/root/stylegan2_train/dnnlib/tflib/network.py", line 97, in init self._init_graph() File "/root/stylegan2_train/dnnlib/tflib/network.py", line 154, in _init_graph out_expr = self._build_func(self.input_templates, **build_kwargs) File "/root/stylegan2_train/training/networks_stylegan2.py", line 641, in G_synthesis_stylegan2 x = layer(x, layer_idx=0, fmaps=nf(1), kernel=3) File "/root/stylegan2_train/training/networks_stylegan2.py", line 565, in layer x = modulated_conv2d_layer(x, dlatents_in[:, layer_idx], fmaps=fmaps, kernel=kernel, up=up, resample_kernel=resample_kernel, fused_modconv=fused_modconv) File "/root/stylegan2_train/training/networks_stylegan2.py", line 100, in modulated_conv2d_layer s = apply_bias_act(s, bias_var=mod_bias_var) + 1 # [BI] Add bias (initially 1). File "/root/stylegan2_train/training/networks_stylegan2.py", line 69, in apply_bias_act return fused_bias_act(x, b=tf.cast(b, x.dtype), act=act, alpha=alpha, gain=gain) File "/root/stylegan2_train/dnnlib/tflib/ops/fused_bias_act.py", line 68, in fused_bias_act return impl_dict[impl](x=x, b=b, axis=axis, act=act, alpha=alpha, gain=gain) File "/root/stylegan2_train/dnnlib/tflib/ops/fused_bias_act.py", line 122, in _fused_bias_act_cuda cuda_kernel = _get_plugin().fused_bias_act File "/root/stylegan2_train/dnnlib/tflib/ops/fused_bias_act.py", line 16, in _get_plugin return custom_ops.get_plugin(os.path.splitext(file)[0] + '.cu') File "/root/stylegan2_train/dnnlib/tflib/custom_ops.py", line 160, in get_plugin plugin = tf.load_op_library(bin_file) File "/root/anaconda3/envs/stylegan2/lib/python3.6/site-packages/tensorflow/python/framework/load_library.py", line 61, in load_op_library lib_handle = py_tf.TF_LoadLibrary(library_filename) tensorflow.python.framework.errors_impl.NotFoundError: /root/stylegan2_train/dnnlib/tflib/_cudacache/fused_bias_act_ec21d79f0dc288505704f796449a968e.so: undefined symbol: _ZN10tensorflow12OpDefBuilder6OutputESs

I try this way as fllow: In file stylegan2/dnnlib/tflib/custom_ops.py, line 127: change from compile_opts += ’ --compiler-options \’-fPIC -D_GLIBCXX_USE_CXX11_ABI=0\’’ to compile_opts += ’ --compiler-options \’-fPIC -D_GLIBCXX_USE_CXX11_ABI=1\’’ But ,It don't work

run : $ ldd root/stylegan2_train/dnnlib/tflib/_cudacache/fused_bias_act_ec21d79f0dc288505704f796449a968e.so': linux-vdso.so.1 => (0x00007ffd1adb2000) _pywrap_tensorflow_internal.so => not found librt.so.1 => /usr/lib64/librt.so.1 (0x00007eff848ff000) libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x00007eff846e3000) libdl.so.2 => /usr/lib64/libdl.so.2 (0x00007eff844df000) libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007eff84e2d000) libm.so.6 => /usr/lib64/libm.so.6 (0x00007eff841dd000) libgcc_s.so.1 => /usr/lib64/libgcc_s.so.1 (0x00007eff83fc7000) libc.so.6 => /usr/lib64/libc.so.6 (0x00007eff83c05000) /lib64/ld-linux-x86-64.so.2 (0x00007eff84da4000) environment: python=3.6.12 tensorflow-gpu=1.14 cuda:10.0 cuDnn:7.6.5

Maglanyulan commented 3 years ago

I used "pip install tensorflow-gpu==1.14.0" to reinstall the tensorflow, then worked.