princeton-vl / infinigen

Infinite Photorealistic Worlds using Procedural Generation
https://infinigen.org
BSD 3-Clause "New" or "Revised" License
5.14k stars 430 forks source link

Error: Failed to create CUDA context #234

Open TomTomTommi opened 1 month ago

TomTomTommi commented 1 month ago

I am trying to install infinigen on a cluster. I follow the command to install it. Since the nvcclocation is not in nvcc_location="/usr/local/cuda/bin/nvcc", if I simply run make terrain, it would skip cuda part. So I modify it to nvcc_location=$(which nvcc) and the compile and installation is successfully. However, when I am trying to run the demo code of Hello world using cuda_terrain, the error occurs:

/rds/general/user/jj323/home/anaconda3/envs/infinigen/lib/python3.10/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.
  setattr(self, word, getattr(machar, word).flat[0])
/rds/general/user/jj323/home/anaconda3/envs/infinigen/lib/python3.10/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.
  return self._float_to_str(self.smallest_subnormal)
/rds/general/user/jj323/home/anaconda3/envs/infinigen/lib/python3.10/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.
  setattr(self, word, getattr(machar, word).flat[0])
/rds/general/user/jj323/home/anaconda3/envs/infinigen/lib/python3.10/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.
  return self._float_to_str(self.smallest_subnormal)
[16:09:57.117] [root] [WARNING] | SMB_AUTH envvar is not set, smb_client upload will not work. Ignore this message if not using upload
[16:09:57.493] [infinigen.core.init] [INFO] | Converted seed='0' to scene_seed=0, parsed as hexadecimal
[16:09:57.552] [infinigen.core.execute_tasks] [INFO] | infinigen version 1.2.5
[16:09:57.552] [infinigen.core.execute_tasks] [INFO] | CUDA_VISIBLE_DEVICES=4
[16:09:57.554] [infinigen.times] [INFO] | [MAIN TOTAL]
[16:09:57.558] [infinigen.times] [INFO] | [Reading input blendfile]
[16:09:59.188] [infinigen.times] [INFO] | [Reading input blendfile] finished in 0:00:01.629659
[16:09:59.189] [root] [WARNING] | Re-initialized 0 as trusted. Do not run infinigen on untrusted blend files. 
[16:09:59.189] [infinigen.core.execute_tasks] [INFO] | Processing frames 48 through 48 inclusive
[16:09:59.422] [infinigen.times] [INFO] | [Create terrain]
[16:09:59.422] [infinigen.terrain.core] [INFO] | Terrain using only on the fly on_the_fly_asset_folder=PosixPath('/rds/general/user/jj323/home/PycharmProjects/infinigen/outputs/hello_world4/0/frames_0_0_0048_0/assets')
[16:10:00.011] [infinigen.terrain.core] [INFO] | Terrain elements: ['ground', 'landtiles', 'warped_rocks', 'voronoi_rocks', 'atmosphere']
[16:10:00.011] [infinigen.times] [INFO] | [Create terrain] finished in 0:00:00.588558
[16:10:00.011] [infinigen.times] [INFO] | [Render Frames]
[16:10:00.011] [infinigen.times] [INFO] | [Enable GPU]
[16:10:00.088] [infinigen.infinigen_gpl.extras.enable_gpu] [INFO] | Device Quadro RTX 6000 of type OPTIX found and used.
[16:10:00.088] [infinigen.times] [INFO] | [Enable GPU] finished in 0:00:00.076616
[16:10:00.088] [infinigen.times] [INFO] | [Render/Cycles settings]
[16:10:00.090] [infinigen.times] [INFO] | [Render/Cycles settings] finished in 0:00:00.002150
[16:10:00.090] [infinigen.times] [INFO] | [Compositing Setup]
[16:10:00.095] [infinigen.times] [INFO] | [Compositing Setup] finished in 0:00:00.004489
[16:10:00.095] [infinigen.times] [INFO] | [get_camera]
[16:10:00.095] [infinigen.times] [INFO] | [get_camera] finished in 0:00:00.000121
[16:10:00.095] [infinigen.times] [INFO] | [Actual rendering]
Failed to create CUDA context (Unknown CUDA error value)

Refer to the Cycles GPU rendering documentation for possible solutions:
https://docs.blender.org/manual/en/latest/render/cycles/gpu_rendering.html

Invalid value in cuCtxPushCurrent(device->cuContext) (intern/cycles/device/cuda/util.cpp:13)
Invalid context in cuStreamCreate(&cuda_stream_, CU_STREAM_NON_BLOCKING) (intern/cycles/device/cuda/queue.cpp:20)
Invalid context in cuCtxPopCurrent(NULL) (intern/cycles/device/cuda/util.cpp:18)
Invalid value in cuCtxPushCurrent(device->cuContext) (intern/cycles/device/cuda/util.cpp:13)
Invalid context in cuCtxPopCurrent(NULL) (intern/cycles/device/cuda/util.cpp:18)
Invalid value in cuCtxPushCurrent(device->cuContext) (intern/cycles/device/cuda/util.cpp:13)
Invalid context in cuCtxPopCurrent(NULL) (intern/cycles/device/cuda/util.cpp:18)
Invalid value in cuCtxDestroy_v2(cuContext) (intern/cycles/device/cuda/device_impl.cpp:126)
[16:10:02.170] [infinigen.times] [INFO] | [Actual rendering] failed with <class 'RuntimeError'>
[16:10:02.172] [infinigen.times] [INFO] | [Render Frames] failed with <class 'RuntimeError'>
[16:10:02.172] [infinigen.times] [INFO] | [MAIN TOTAL] failed with <class 'RuntimeError'>
Traceback (most recent call last):
  File "/rds/general/user/jj323/home/anaconda3/envs/infinigen/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/rds/general/user/jj323/home/anaconda3/envs/infinigen/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/rds/general/user/jj323/home/PycharmProjects/infinigen/infinigen_examples/generate_nature.py", line 438, in <module>
    main(args)
  File "/rds/general/user/jj323/home/PycharmProjects/infinigen/infinigen_examples/generate_nature.py", line 409, in main
    execute_tasks.main(
  File "/rds/general/user/jj323/home/PycharmProjects/infinigen/infinigen/core/execute_tasks.py", line 418, in main
    execute_tasks(
  File "/rds/general/user/jj323/home/anaconda3/envs/infinigen/lib/python3.10/site-packages/gin/config.py", line 1605, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/rds/general/user/jj323/home/anaconda3/envs/infinigen/lib/python3.10/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/rds/general/user/jj323/home/anaconda3/envs/infinigen/lib/python3.10/site-packages/gin/config.py", line 1582, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/rds/general/user/jj323/home/PycharmProjects/infinigen/infinigen/core/execute_tasks.py", line 380, in execute_tasks
    render(
  File "/rds/general/user/jj323/home/anaconda3/envs/infinigen/lib/python3.10/site-packages/gin/config.py", line 1605, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/rds/general/user/jj323/home/anaconda3/envs/infinigen/lib/python3.10/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/rds/general/user/jj323/home/anaconda3/envs/infinigen/lib/python3.10/site-packages/gin/config.py", line 1582, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/rds/general/user/jj323/home/PycharmProjects/infinigen/infinigen/core/execute_tasks.py", line 227, in render
    render_image_func(frames_folder=Path(output_folder), camera_id=camera_id)
  File "/rds/general/user/jj323/home/anaconda3/envs/infinigen/lib/python3.10/site-packages/gin/config.py", line 670, in scoping_wrapper
    return fn_or_cls(*args, **kwargs)
  File "/rds/general/user/jj323/home/anaconda3/envs/infinigen/lib/python3.10/site-packages/gin/config.py", line 1605, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/rds/general/user/jj323/home/anaconda3/envs/infinigen/lib/python3.10/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/rds/general/user/jj323/home/anaconda3/envs/infinigen/lib/python3.10/site-packages/gin/config.py", line 1582, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/rds/general/user/jj323/home/PycharmProjects/infinigen/infinigen/core/rendering/render.py", line 357, in render_image
    bpy.ops.render.render(animation=True)
  File "/rds/general/user/jj323/home/anaconda3/envs/infinigen/lib/python3.10/site-packages/bpy/3.6/scripts/modules/bpy/ops.py", line 113, in __call__
    ret = _op_call(self.idname_py(), None, kw)
RuntimeError: Error: Failed to create CUDA context (Unknown CUDA error value)
Error: Failed to create CUDA context (Unknown CUDA error value)

  In call to configurable 'render_image' (<function render_image at 0x1538bfdc71c0>) in scope 'full'
  In call to configurable 'render' (<function render at 0x1538bfdc43a0>)
  In call to configurable 'execute_tasks' (<function execute_tasks at 0x1538bfdc7c70>)

What should I do for this circumstance if the default nvcc location is not /usr/local/cuda/bin/nvcc but /sw-eb/software/CUDA/12.2.2/bin/nvcc

araistrick commented 1 month ago

@mazeyu

mazeyu commented 1 month ago

It seems no longer related to terrain but to blender rendering. maybe check the link in the error message to see if there is anything useful.

TomTomTommi commented 1 month ago

The log file is as follows. 70529843081_0_log.err.txt 70529843081_0_log.out.txt

araistrick commented 1 month ago

This does indeed seem to be an issue with Blender rendering, not Infinigen itself. Please check that other CUDA applications work on the machine, and also try doing a CUDA-enabled render via the Blender UI to test.