sh4174 / 3DStyleGAN

3D StyleGAN2 for Medical Images
64 stars 15 forks source link

tensorflow.python.framework.errors_impl.NotFoundError: /work16t/yshi/3DStyleGAN/dnnlib/tflib/_cudacache/fused_bias_act_360d764a7f04fadac3b4f82a88fc103e.so: undefined symbol: _ZN10tensorflow14kernel_factory17OpKernelRegistrar12InitInternalEPKNS_9KernelDefESt17basic_string_viewIcSt11char_traitsIcEESt10unique_ptrINS0_15OpKernelFactoryESt14default_deleteISA_EE #7

Open yshi20 opened 1 year ago

yshi20 commented 1 year ago

Hi,

I'm trying to use this codebase to train my own data. However there is an error I have no clue how to solve. My environment build with RTX 3090 with 470 driver , cuda11.8, cudnn8.6 tensorflow2.4.0 here is the error message

Compiling... Loading... Failed! Traceback (most recent call last): File "run_training.py", line 588, in main() File "run_training.py", line 583, in main run(vars(args)) File "run_training.py", line 512, in run dnnlib.submit_run(kwargs) File "/work16t/yshi/3DStyleGAN/dnnlib/submission/submit.py", line 343, in submit_run return farm.submit(submit_config, host_run_dir) File "/work16t/yshi/3DStyleGAN/dnnlib/submission/internal/local.py", line 22, in submit return run_wrapper(submit_config) File "/work16t/yshi/3DStyleGAN/dnnlib/submission/submit.py", line 280, in run_wrapper run_func_obj(submit_config.run_func_kwargs) File "/work16t/yshi/3DStyleGAN/training/training_loop_3d.py", line 165, in training_loop G = tflib.Network('G', num_channels=training_set.shape[0], resolution=min_res, label_size=training_set.label_size, G_args) File "/work16t/yshi/3DStyleGAN/dnnlib/tflib/network.py", line 99, in init self._init_graph() File "/work16t/yshi/3DStyleGAN/dnnlib/tflib/network.py", line 156, in _init_graph out_expr = self._build_func(self.input_templates, build_kwargs) File "/work16t/yshi/3DStyleGAN/training/networks3d_stylegan2.py", line 195, in G_main components.synthesis = tflib.Network('G_synthesis', func_name=globals()[synthesis_func], kwargs) File "/work16t/yshi/3DStyleGAN/dnnlib/tflib/network.py", line 99, in init self._init_graph() File "/work16t/yshi/3DStyleGAN/dnnlib/tflib/network.py", line 156, in _init_graph out_expr = self._build_func(self.input_templates, **build_kwargs) File "/work16t/yshi/3DStyleGAN/training/networks3d_stylegan2.py", line 425, in G_synthesis_stylegan2_3d_curated_real x = layer(x, layer_idx=0, fmaps=nf(1), kernel=3) File "/work16t/yshi/3DStyleGAN/training/networks3d_stylegan2.py", line 376, in layer x = modulated_conv3d_layer(x, dlatents_in[:, layer_idx], fmaps=fmaps, kernel=kernel, up=up, resample_kernel=resample_kernel, fused_modconv=fused_modconv) File "/work16t/yshi/3DStyleGAN/training/networks3d_stylegan2.py", line 106, in modulated_conv3d_layer s = apply_bias_act(s, bias_var=mod_bias_var) + 1 # [BI] Add bias (initially 1). File "/work16t/yshi/3DStyleGAN/training/networks3d_stylegan2.py", line 75, in apply_bias_act return fused_bias_act(x, b=tf.cast(b, x.dtype), act=act, alpha=alpha, gain=gain) File "/work16t/yshi/3DStyleGAN/dnnlib/tflib/ops/fused_bias_act.py", line 68, in fused_bias_act return impl_dict[impl](x=x, b=b, axis=axis, act=act, alpha=alpha, gain=gain) File "/work16t/yshi/3DStyleGAN/dnnlib/tflib/ops/fused_bias_act.py", line 122, in _fused_bias_act_cuda cuda_kernel = _get_plugin().fused_bias_act File "/work16t/yshi/3DStyleGAN/dnnlib/tflib/ops/fused_bias_act.py", line 16, in _get_plugin return custom_ops.get_plugin(os.path.splitext(file)[0] + '.cu') File "/work16t/yshi/3DStyleGAN/dnnlib/tflib/custom_ops.py", line 180, in get_plugin plugin = tf.load_op_library(bin_file) File "/home/yshi/anaconda3/envs/3dgan_tf2/lib/python3.8/site-packages/tensorflow/python/framework/load_library.py", line 57, in load_op_library lib_handle = py_tf.TF_LoadLibrary(library_filename) tensorflow.python.framework.errors_impl.NotFoundError: /work16t/yshi/3DStyleGAN/dnnlib/tflib/_cudacache/fused_bias_act_360d764a7f04fadac3b4f82a88fc103e.so: undefined symbol: _ZN10tensorflow14kernel_factory17OpKernelRegistrar12InitInternalEPKNS_9KernelDefESt17basic_string_viewIcSt11char_traitsIcEESt10unique_ptrINS0_15OpKernelFactoryESt14default_deleteISA_EE

razvanmarinescu commented 1 year ago

Not sure how to help debug a specific issue about tensorflow environments. It might be better to ask this on the tensorflow github.

fedorukol commented 1 year ago

Hi! Run into same problem while trying to run generation on Google Colab. After some fights with condacolab and environment configurations I've found a solution.

Change line 68 in _./3DStyleGAN/dnnlib/tflib/ops/fused_biasact.py to:

return impl_dict["ref"](x=x, b=b, axis=axis, act=act, alpha=alpha, gain=gain)

It will enforce usage of slower implementation but it gets the job done.

yshi20 commented 1 year ago

Hi! Run into same problem while trying to run generation on Google Colab. After some fights with condacolab and environment configurations I've found a solution.

Change line 68 in _./3DStyleGAN/dnnlib/tflib/ops/fused_biasact.py to:

return impl_dict["ref"](x=x, b=b, axis=axis, act=act, alpha=alpha, gain=gain)

It will enforce usage of slower implementation but it gets the job done.

That worked out! Thanks!