google / nerfactor

Neural Factorization of Shape and Reflectance Under an Unknown Illumination
https://xiuming.info/projects/nerfactor/
Apache License 2.0
437 stars 56 forks source link

I get error when I train vanilla NeRF. #4

Closed bruinxiong closed 3 years ago

bruinxiong commented 3 years ago

Hi, thanks to your nice work. However, I get error when I prepare to train vanilla NeRF following the instruction.

The error is printed as follow: [trainvali] For results, see: /home/linxiong/nerfactor/output/train/hotdog_nerf/lr5e-4 [datasets/nerf] Number of 'train' views: 100 Traceback (most recent call last): File "/home/linxiong/nerfactor/nerfactor/trainvali.py", line 348, in app.run(main) File "/home/linxiong/.local/lib/python3.8/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/home/linxiong/.local/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "/home/linxiong/nerfactor/nerfactor/trainvali.py", line 91, in main datapipe_train = dataset_train.build_pipeline(no_batch=no_batch) File "../nerfactor/datasets/base.py", line 115, in build_pipeline dataset = dataset.map( File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1623, in map return ParallelMapDataset( File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 4016, in init self._map_func = StructuredFunctionWrapper( File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 3221, in init self._function = wrapper_fn.get_concrete_function() File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2531, in get_concrete_function graph_function = self._get_concrete_function_garbage_collected( File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2496, in _get_concrete_function_garbage_collected graph_function, args, kwargs = self._maybe_define_function(args, kwargs) File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2777, in _maybe_define_function graph_function = self._create_graph_function(args, kwargs) File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2657, in _create_graph_function func_graph_module.func_graph_from_py_func( File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 981, in func_graph_from_py_func func_outputs = python_func(*func_args, func_kwargs) File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 3214, in wrapper_fn ret = _wrapper_helper(args) File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 3156, in _wrapper_helper ret = autograph.tf_convert(func, ag_ctx)(nested_args) File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 580, in call result = self._call(*args, *kwds) File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 627, in _call self._initialize(args, kwds, add_initializers_to=initializers) File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 505, in _initialize self._stateful_fn._get_concrete_function_internal_garbage_collected( # pylint: disable=protected-access File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2446, in _get_concrete_function_internal_garbage_collected graphfunction, , _ = self._maybe_define_function(args, kwargs) File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2777, in _maybe_define_function graph_function = self._create_graph_function(args, kwargs) File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2657, in _create_graph_function func_graph_module.func_graph_from_py_func( File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 981, in func_graph_from_py_func func_outputs = python_func(func_args, func_kwargs) File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 441, in wrapped_fn return weak_wrapped_fn().wrapped(*args, *kwds) File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3299, in bound_method_wrapper return wrapped_fn(args, **kwargs) File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 968, in wrapper raise e.ag_error_metadata.to_exception(e) NotImplementedError: in user code:

/home/linxiong/nerfactor/nerfactor/datasets/nerf.py:111 _process_example_postcache  *
    rayo, rayd, rgb = self._sample_rays(self.rayo, self.rayd, self.rgb)
/home/linxiong/nerfactor/nerfactor/datasets/nerf.py:130 _sample_rays  *
    coords = tf.stack(
/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/ops/array_ops.py:3391 meshgrid  **
    mult_fact = ones(shapes, output_dtype)
/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/ops/array_ops.py:2967 ones
    output = _constant_if_small(one, shape, dtype, name)
/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/ops/array_ops.py:2662 _constant_if_small
    if np.prod(shape) < 1000:
<__array_function__ internals>:5 prod

/home/linxiong/.local/lib/python3.8/site-packages/numpy/core/fromnumeric.py:3051 prod
    return _wrapreduction(a, np.multiply, 'prod', axis, dtype, out,
/home/linxiong/.local/lib/python3.8/site-packages/numpy/core/fromnumeric.py:86 _wrapreduction
    return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/framework/ops.py:748 __array__
    raise NotImplementedError("Cannot convert a symbolic Tensor ({}) to a numpy"

NotImplementedError: Cannot convert a symbolic Tensor (meshgrid/Size:0) to a numpy array.

Any suggestion ?

bruinxiong commented 3 years ago

I follow this https://github.com/tensorflow/models/issues/9706 to fix upper issue. But I get another error: (base) linxiong:nerfactor$ bash trainvali_run.sh 2021-08-16 16:55:04.319120: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1 2021-08-16 16:55:04.389781: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: pciBusID: 0000:17:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1 coreClock: 1.6325GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s 2021-08-16 16:55:04.390613: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 1 with properties: pciBusID: 0000:65:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1 coreClock: 1.6325GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s 2021-08-16 16:55:04.391180: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 2 with properties: pciBusID: 0000:b3:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1 coreClock: 1.6325GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s 2021-08-16 16:55:04.391415: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1 2021-08-16 16:55:04.392896: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10 2021-08-16 16:55:04.394275: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10 2021-08-16 16:55:04.394544: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10 2021-08-16 16:55:04.396130: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10 2021-08-16 16:55:04.396979: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10 2021-08-16 16:55:04.400638: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2021-08-16 16:55:04.404467: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0, 1, 2 2021-08-16 16:55:04.404805: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA 2021-08-16 16:55:04.410786: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3499910000 Hz 2021-08-16 16:55:04.411172: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fbb2c000b20 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2021-08-16 16:55:04.411197: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2021-08-16 16:55:04.640402: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x562d39dcfa40 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices: 2021-08-16 16:55:04.640433: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1 2021-08-16 16:55:04.640441: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (1): GeForce GTX 1080 Ti, Compute Capability 6.1 2021-08-16 16:55:04.640448: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (2): GeForce GTX 1080 Ti, Compute Capability 6.1 2021-08-16 16:55:04.644029: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: pciBusID: 0000:17:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1 coreClock: 1.6325GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s 2021-08-16 16:55:04.644549: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 1 with properties: pciBusID: 0000:65:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1 coreClock: 1.6325GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s 2021-08-16 16:55:04.645066: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 2 with properties: pciBusID: 0000:b3:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1 coreClock: 1.6325GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s 2021-08-16 16:55:04.645111: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1 2021-08-16 16:55:04.645124: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10 2021-08-16 16:55:04.645142: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10 2021-08-16 16:55:04.645155: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10 2021-08-16 16:55:04.645167: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10 2021-08-16 16:55:04.645179: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10 2021-08-16 16:55:04.645193: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2021-08-16 16:55:04.648050: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0, 1, 2 2021-08-16 16:55:04.648087: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1 2021-08-16 16:55:04.649975: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-08-16 16:55:04.649994: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0 1 2 2021-08-16 16:55:04.650003: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N Y Y 2021-08-16 16:55:04.650011: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 1: Y N Y 2021-08-16 16:55:04.650019: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 2: Y Y N 2021-08-16 16:55:04.654752: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10372 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:17:00.0, compute capability: 6.1) 2021-08-16 16:55:04.655613: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10093 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:65:00.0, compute capability: 6.1) 2021-08-16 16:55:04.656427: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 10372 MB memory) -> physical GPU (device: 2, name: GeForce GTX 1080 Ti, pci bus id: 0000:b3:00.0, compute capability: 6.1) INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2') I0816 16:55:04.659288 140447264786240 mirrored_strategy.py:500] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2') [util/io] Output directory already exisits: /home/linxiong/nerfactor/output/train/hotdog_nerf/lr5e-4 [util/io] Overwrite is off, so doing nothing [trainvali] For results, see: /home/linxiong/nerfactor/output/train/hotdog_nerf/lr5e-4 [datasets/nerf] Number of 'train' views: 100 Traceback (most recent call last): File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 4675, in parallel_map_dataset _result = pywrap_tfe.TFE_Py_FastPathExecute( tensorflow.python.eager.core._FallbackException: This function does not handle the case of the path where all inputs are not already EagerTensors.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/linxiong/nerfactor/nerfactor/trainvali.py", line 350, in app.run(main) File "/home/linxiong/.local/lib/python3.8/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/home/linxiong/.local/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "/home/linxiong/nerfactor/nerfactor/trainvali.py", line 93, in main datapipe_train = dataset_train.build_pipeline(no_batch=no_batch) File "../nerfactor/datasets/base.py", line 116, in build_pipeline dataset = dataset.map( File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1623, in map return ParallelMapDataset( File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 4043, in init variant_tensor = gen_dataset_ops.parallel_map_dataset( File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 4684, in parallel_map_dataset return parallel_map_dataset_eager_fallback( File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 4761, in parallel_map_dataset_eager_fallback _attr_Targuments, other_arguments = _execute.convert_to_mixed_eager_tensors(other_arguments, ctx) File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 283, in convert_to_mixed_eager_tensors types = [t._datatype_enum() for t in v] # pylint: disable=protected-access File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 283, in types = [t._datatype_enum() for t in v] # pylint: disable=protected-access AttributeError: 'Tensor' object has no attribute '_datatype_enum'

xiumingzhang commented 3 years ago

https://stackoverflow.com/questions/58479556/notimplementederror-cannot-convert-a-symbolic-tensor-2nd-target0-to-a-numpy seems to suggest this is a NumPy version mismatch issue.

Have you tried setting up a fresh Conda environment using the environment.yml of this repo?

bruinxiong commented 3 years ago

https://stackoverflow.com/questions/58479556/notimplementederror-cannot-convert-a-symbolic-tensor-2nd-target0-to-a-numpy seems to suggest this is a NumPy version mismatch issue.

Have you tried setting up a fresh Conda environment using the environment.yml of this repo?

@xiumingzhang Thank you for your instant reply. Yes, I try create new conda enviroment using the enviroment.yml. But I directly get the above error. The exactly location of code is in def _process_example_postcache of nerf.py file, also related with tf.stack op in def _sample_rays function.

bruinxiong commented 3 years ago

@xiumingzhang After, I create new conda environment using the environment.yml. I get the same error as above. (/home/linxiong/.conda/env/nerfactor) linxiong:nerfactor$ bash trainvali_run.sh 2021-08-18 09:34:57.981556: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1 2021-08-18 09:34:58.054786: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: pciBusID: 0000:17:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1 coreClock: 1.6325GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s 2021-08-18 09:34:58.055606: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 1 with properties: pciBusID: 0000:65:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1 coreClock: 1.6325GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s 2021-08-18 09:34:58.056281: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 2 with properties: pciBusID: 0000:b3:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1 coreClock: 1.6325GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s 2021-08-18 09:34:58.056475: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1 2021-08-18 09:34:58.057963: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10 2021-08-18 09:34:58.059375: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10 2021-08-18 09:34:58.059635: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10 2021-08-18 09:34:58.061163: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10 2021-08-18 09:34:58.062120: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10 2021-08-18 09:34:58.065684: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2021-08-18 09:34:58.070558: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0, 1, 2 2021-08-18 09:34:58.070872: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA 2021-08-18 09:34:58.076698: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3499910000 Hz 2021-08-18 09:34:58.077118: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fe524000b20 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2021-08-18 09:34:58.077148: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2021-08-18 09:34:58.284690: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5638da4f7360 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices: 2021-08-18 09:34:58.284729: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1 2021-08-18 09:34:58.284738: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (1): GeForce GTX 1080 Ti, Compute Capability 6.1 2021-08-18 09:34:58.284745: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (2): GeForce GTX 1080 Ti, Compute Capability 6.1 2021-08-18 09:34:58.290014: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: pciBusID: 0000:17:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1 coreClock: 1.6325GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s 2021-08-18 09:34:58.290549: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 1 with properties: pciBusID: 0000:65:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1 coreClock: 1.6325GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s 2021-08-18 09:34:58.291044: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 2 with properties: pciBusID: 0000:b3:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1 coreClock: 1.6325GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s 2021-08-18 09:34:58.291097: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1 2021-08-18 09:34:58.291113: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10 2021-08-18 09:34:58.291126: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10 2021-08-18 09:34:58.291139: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10 2021-08-18 09:34:58.291152: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10 2021-08-18 09:34:58.291170: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10 2021-08-18 09:34:58.291187: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2021-08-18 09:34:58.293966: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0, 1, 2 2021-08-18 09:34:58.294005: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1 2021-08-18 09:34:58.295807: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-08-18 09:34:58.295825: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0 1 2 2021-08-18 09:34:58.295834: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N Y Y 2021-08-18 09:34:58.295843: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 1: Y N Y 2021-08-18 09:34:58.295854: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 2: Y Y N 2021-08-18 09:34:58.297826: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10372 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:17:00.0, compute capability: 6.1) 2021-08-18 09:34:58.298675: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10104 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:65:00.0, compute capability: 6.1) 2021-08-18 09:34:58.299561: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 10372 MB memory) -> physical GPU (device: 2, name: GeForce GTX 1080 Ti, pci bus id: 0000:b3:00.0, compute capability: 6.1) INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2') I0818 09:34:58.302564 140628559923008 mirrored_strategy.py:500] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2') [trainvali] For results, see: /home/linxiong/nerfactor/output/train/hotdog_nerf/lr5e-4 [datasets/nerf] Number of 'train' views: 100 Traceback (most recent call last): File "/home/linxiong/.conda/env/nerfactor/lib/python3.6/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 4680, in parallel_map_dataset sloppy, "preserve_cardinality", preserve_cardinality) tensorflow.python.eager.core._FallbackException: This function does not handle the case of the path where all inputs are not already EagerTensors.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/linxiong/nerfactor/nerfactor/trainvali.py", line 350, in app.run(main) File "/home/linxiong/.conda/env/nerfactor/lib/python3.6/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/home/linxiong/.conda/env/nerfactor/lib/python3.6/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "/home/linxiong/nerfactor/nerfactor/trainvali.py", line 93, in main datapipe_train = dataset_train.build_pipeline(no_batch=no_batch) File "../nerfactor/datasets/base.py", line 119, in build_pipeline num_parallel_calls=self.n_map_parallel_calls) File "/home/linxiong/.conda/env/nerfactor/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1628, in map preserve_cardinality=True) File "/home/linxiong/.conda/env/nerfactor/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 4050, in init **self._flat_structure) File "/home/linxiong/.conda/env/nerfactor/lib/python3.6/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 4688, in parallel_map_dataset preserve_cardinality=preserve_cardinality, name=name, ctx=_ctx) File "/home/linxiong/.conda/env/nerfactor/lib/python3.6/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 4761, in parallel_map_dataset_eager_fallback _attr_Targuments, other_arguments = _execute.convert_to_mixed_eager_tensors(other_arguments, ctx) File "/home/linxiong/.conda/env/nerfactor/lib/python3.6/site-packages/tensorflow/python/eager/execute.py", line 283, in convert_to_mixed_eager_tensors types = [t._datatype_enum() for t in v] # pylint: disable=protected-access File "/home/linxiong/.conda/env/nerfactor/lib/python3.6/site-packages/tensorflow/python/eager/execute.py", line 283, in types = [t._datatype_enum() for t in v] # pylint: disable=protected-access AttributeError: 'Tensor' object has no attribute '_datatype_enum'

xiumingzhang commented 3 years ago

Thanks for the detailed tracestack. Looks like a TF "eager vs. graph mode" issue. Our code was developed using the eager TF 2.2. Could you please make sure that the TF that got used is indeed 2.2, by print(tf.__version__)? Sometimes what you thought got used may not be what is actually used. Thanks.

On Wed, Aug 18, 2021 at 5:37 AM Xiong Lin @.***> wrote:

@xiumingzhang https://github.com/xiumingzhang After, I create new conda environment using the environment.yml. I get the same error as above. (/home/linxiong/.conda/env/nerfactor) linxiong:nerfactor$ bash trainvali_run.sh 2021-08-18 09:34:57.981556: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1 2021-08-18 09:34:58.054786: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: pciBusID: 0000:17:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1 coreClock: 1.6325GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s 2021-08-18 09:34:58.055606: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 1 with properties: pciBusID: 0000:65:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1 coreClock: 1.6325GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s 2021-08-18 09:34:58.056281: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 2 with properties: pciBusID: 0000:b3:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1 coreClock: 1.6325GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s 2021-08-18 09:34:58.056475: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1 2021-08-18 09:34:58.057963: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10 2021-08-18 09:34:58.059375: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10 2021-08-18 09:34:58.059635: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10 2021-08-18 09:34:58.061163: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10 2021-08-18 09:34:58.062120: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10 2021-08-18 09:34:58.065684: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2021-08-18 09:34:58.070558: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0, 1, 2 2021-08-18 09:34:58.070872: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA 2021-08-18 09:34:58.076698: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3499910000 Hz 2021-08-18 09:34:58.077118: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fe524000b20 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2021-08-18 09:34:58.077148: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2021-08-18 09:34:58.284690: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5638da4f7360 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices: 2021-08-18 09:34:58.284729: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1 2021-08-18 09:34:58.284738: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (1): GeForce GTX 1080 Ti, Compute Capability 6.1 2021-08-18 09:34:58.284745: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (2): GeForce GTX 1080 Ti, Compute Capability 6.1 2021-08-18 09:34:58.290014: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: pciBusID: 0000:17:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1 coreClock: 1.6325GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s 2021-08-18 09:34:58.290549: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 1 with properties: pciBusID: 0000:65:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1 coreClock: 1.6325GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s 2021-08-18 09:34:58.291044: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 2 with properties: pciBusID: 0000:b3:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1 coreClock: 1.6325GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s 2021-08-18 09:34:58.291097: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1 2021-08-18 09:34:58.291113: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10 2021-08-18 09:34:58.291126: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10 2021-08-18 09:34:58.291139: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10 2021-08-18 09:34:58.291152: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10 2021-08-18 09:34:58.291170: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10 2021-08-18 09:34:58.291187: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2021-08-18 09:34:58.293966: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0, 1, 2 2021-08-18 09:34:58.294005: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1 2021-08-18 09:34:58.295807: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-08-18 09:34:58.295825: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0 1 2 2021-08-18 09:34:58.295834: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N Y Y 2021-08-18 09:34:58.295843: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 1: Y N Y 2021-08-18 09:34:58.295854: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 2: Y Y N 2021-08-18 09:34:58.297826: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10372 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:17:00.0, compute capability: 6.1) 2021-08-18 09:34:58.298675: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10104 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:65:00.0, compute capability: 6.1) 2021-08-18 09:34:58.299561: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 10372 MB memory) -> physical GPU (device: 2, name: GeForce GTX 1080 Ti, pci bus id: 0000:b3:00.0, compute capability: 6.1) INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2') I0818 09:34:58.302564 140628559923008 mirrored_strategy.py:500] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2') [trainvali] For results, see: /home/linxiong/nerfactor/output/train/hotdog_nerf/lr5e-4 [datasets/nerf] Number of 'train' views: 100 Traceback (most recent call last): File "/home/linxiong/.conda/env/nerfactor/lib/python3.6/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 4680, in parallel_map_dataset sloppy, "preserve_cardinality", preserve_cardinality) tensorflow.python.eager.core._FallbackException: This function does not handle the case of the path where all inputs are not already EagerTensors.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/linxiong/nerfactor/nerfactor/trainvali.py", line 350, in app.run(main) File "/home/linxiong/.conda/env/nerfactor/lib/python3.6/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/home/linxiong/.conda/env/nerfactor/lib/python3.6/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "/home/linxiong/nerfactor/nerfactor/trainvali.py", line 93, in main datapipe_train = dataset_train.build_pipeline(no_batch=no_batch) File "../nerfactor/datasets/base.py", line 119, in build_pipeline num_parallel_calls=self.n_map_parallel_calls) File "/home/linxiong/.conda/env/nerfactor/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1628, in map preserve_cardinality=True) File "/home/linxiong/.conda/env/nerfactor/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 4050, in init **self._flat_structure) File "/home/linxiong/.conda/env/nerfactor/lib/python3.6/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 4688, in parallel_map_dataset preserve_cardinality=preserve_cardinality, name=name, ctx=_ctx) File "/home/linxiong/.conda/env/nerfactor/lib/python3.6/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 4761, in parallel_map_dataset_eager_fallback _attr_Targuments, other_arguments = _execute.convert_to_mixed_eager_tensors(other_arguments, ctx) File "/home/linxiong/.conda/env/nerfactor/lib/python3.6/site-packages/tensorflow/python/eager/execute.py", line 283, in convert_to_mixed_eager_tensors types = [t._datatype_enum() for t in v] # pylint: disable=protected-access File "/home/linxiong/.conda/env/nerfactor/lib/python3.6/site-packages/tensorflow/python/eager/execute.py", line 283, in types = [t._datatype_enum() for t in v] # pylint: disable=protected-access AttributeError: 'Tensor' object has no attribute '_datatype_enum'

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/google/nerfactor/issues/4#issuecomment-900969699, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACXICPQK64ROWYVQSDHRHL3T5N5N7ANCNFSM5CHMZGWA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

xiumingzhang commented 3 years ago

Closing due to no response. Please reopen this if you still have problems.