princeton-vl / infinigen

Infinite Photorealistic Worlds using Procedural Generation
https://infinigen.org
BSD 3-Clause "New" or "Revised" License
5.14k stars 430 forks source link

Low GPU usage #228

Closed TomTomTommi closed 1 month ago

TomTomTommi commented 2 months ago

Following the command

python -m infinigen.datagen.manage_jobs --output_folder outputs/my_videos --num_scenes 500     --pipeline_config local_16GB monocular_video cuda_terrain opengl_gt     --cleanup big_files --warmup_sec 60000 --config video high_quality_terrain

currently the output is as follows:

outputs/my_videos 04/24 10:18AM -> 04/25 01:22AM
============================================================
control_state/curr_concurrent_max : 1
control_state/disk_usage       : 0.29
control_state/n_in_flight      : 1
control_state/try_to_launch    : 0
control_state/will_launch      : 0
crashed/coarse                 : 173
crashed/fineterrain            : 14
crashed/total                  : 187
running/coarse                 : 1
running/total                  : 1
succeeded/coarse               : 14
succeeded/total                : 14

It is still running. I wonder if this is correct since it has been running for 15 hours. Why are the crashes so high? Plus, according to other issues, the GPU consumption should be about 20GB. But I found my GPU usage is quite low.

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.171.04             Driver Version: 535.171.04   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4090        Off | 00000000:01:00.0  On |                  Off |
|  0%   34C    P8              20W / 500W |    641MiB / 24564MiB |      4%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1142      G   /usr/lib/xorg/Xorg                           59MiB |
|    0   N/A  N/A      1868      G   /usr/lib/xorg/Xorg                          192MiB |
|    0   N/A  N/A      1999      G   /usr/bin/gnome-shell                        164MiB |
|    0   N/A  N/A      3672      G   ...erProcess --variations-seed-version       81MiB |
|    0   N/A  N/A     11672      G   /opt/teamviewer/tv_bin/TeamViewer            23MiB |
|    0   N/A  N/A   1517432      G   ...seed-version=20240416-180159.777000       68MiB |
+---------------------------------------------------------------------------------------+

The CPU usage is full.

    PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
1882020 jj323      39  19 4361M 1752M  209M R 100.  5.5 10:25.98 /home/jj323/anaconda3/envs/infinigen/bin/python -m infin

I am sure to involve cuda_terrain in the command. What is the problem?

Platform

araistrick commented 2 months ago

coarse is a CPU only job which generates latyouts, and only this job seems to be running. The system seems to be crashing on fineterrain and hence is never getting to any jobs which actually use the GPU.

Can you provide the contents of crash_summaries.txt and the .out and .err files for any crashed jobs? You can always check on these files to determine what is actually going on during the jobs - currently it isnt making any progress.
My guess would be something went wrong with installing cuda terrain.

TomTomTommi commented 2 months ago

The error is 'GLIBCXX_3.4.29 not found'. The crash_summaries.txt is attached as below. crash_summaries.txt the content of one .out file is

/home/jj323/anaconda3/envs/infinigen/lib/python3.10/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.
  setattr(self, word, getattr(machar, word).flat[0])
/home/jj323/anaconda3/envs/infinigen/lib/python3.10/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.
  return self._float_to_str(self.smallest_subnormal)
/home/jj323/anaconda3/envs/infinigen/lib/python3.10/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.
  setattr(self, word, getattr(machar, word).flat[0])
/home/jj323/anaconda3/envs/infinigen/lib/python3.10/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.
  return self._float_to_str(self.smallest_subnormal)
[10:18:20.075] [root] [WARNING] | SMB_AUTH envvar is not set, smb_client upload will not work. Ignore this message if not using upload
[10:18:20.083] [infinigen.core.init] [INFO] | Converted seed='4d9f3558' to scene_seed=1302279512, parsed as hexadecimal
[10:18:20.098] [infinigen.core.execute_tasks] [INFO] | infinigen version 1.2.5
[10:18:20.098] [infinigen.core.execute_tasks] [INFO] | CUDA_VISIBLE_DEVICES=
[10:18:20.098] [infinigen.times] [INFO] | [MAIN TOTAL]
[10:18:20.098] [infinigen.core.execute_tasks] [INFO] | Processing frames 1 through 192 inclusive
[10:18:20.104] [infinigen.times] [INFO] | [terrain]
[10:18:20.104] [infinigen.times] [INFO] | [Create terrain]
[10:18:20.104] [infinigen.terrain.core] [INFO] | Terrain using only on the fly on_the_fly_asset_folder=PosixPath('/home/jj323/PycharmProjects/infinigen/outputs/my_videos/4d9f3558/coarse/assets')
[10:26:33.654] [infinigen.times] [INFO] | [Create terrain] failed with <class 'OSError'>
[10:26:33.654] [infinigen.times] [INFO] | [terrain] failed with <class 'OSError'>
[10:26:33.654] [infinigen.times] [INFO] | [MAIN TOTAL] failed with <class 'OSError'>
Traceback (most recent call last):
  File "/home/jj323/anaconda3/envs/infinigen/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/jj323/anaconda3/envs/infinigen/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/jj323/PycharmProjects/infinigen/infinigen_examples/generate_nature.py", line 438, in <module>
    main(args)
  File "/home/jj323/PycharmProjects/infinigen/infinigen_examples/generate_nature.py", line 409, in main
    execute_tasks.main(
  File "/home/jj323/PycharmProjects/infinigen/infinigen/core/execute_tasks.py", line 418, in main
    execute_tasks(
  File "/home/jj323/anaconda3/envs/infinigen/lib/python3.10/site-packages/gin/config.py", line 1605, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/home/jj323/anaconda3/envs/infinigen/lib/python3.10/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/home/jj323/anaconda3/envs/infinigen/lib/python3.10/site-packages/gin/config.py", line 1582, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/home/jj323/PycharmProjects/infinigen/infinigen/core/execute_tasks.py", line 328, in execute_tasks
    compose_scene_func(output_folder, scene_seed)
  File "/home/jj323/anaconda3/envs/infinigen/lib/python3.10/site-packages/gin/config.py", line 1605, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/home/jj323/anaconda3/envs/infinigen/lib/python3.10/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/home/jj323/anaconda3/envs/infinigen/lib/python3.10/site-packages/gin/config.py", line 1582, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/home/jj323/PycharmProjects/infinigen/infinigen_examples/generate_nature.py", line 79, in compose_scene
    terrain, terrain_mesh = p.run_stage('terrain', add_coarse_terrain, use_chance=False, default=(None, None))
  File "/home/jj323/PycharmProjects/infinigen/infinigen/core/util/pipeline.py", line 76, in run_stage
    ret = fn(*args, **kwargs)
  File "/home/jj323/PycharmProjects/infinigen/infinigen_examples/generate_nature.py", line 75, in add_coarse_terrain
    terrain = Terrain(scene_seed, surface.registry, task='coarse', on_the_fly_asset_folder=output_folder/"assets")
  File "/home/jj323/anaconda3/envs/infinigen/lib/python3.10/site-packages/gin/config.py", line 1605, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/home/jj323/anaconda3/envs/infinigen/lib/python3.10/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/home/jj323/anaconda3/envs/infinigen/lib/python3.10/site-packages/gin/config.py", line 1582, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/home/jj323/PycharmProjects/infinigen/infinigen/terrain/core.py", line 126, in __init__
    self.elements, scene_infos = scene(seed, Path(on_the_fly_asset_folder), asset_path, device)
  File "/home/jj323/anaconda3/envs/infinigen/lib/python3.10/site-packages/gin/config.py", line 1605, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/home/jj323/anaconda3/envs/infinigen/lib/python3.10/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/home/jj323/anaconda3/envs/infinigen/lib/python3.10/site-packages/gin/config.py", line 1582, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/home/jj323/PycharmProjects/infinigen/infinigen/terrain/scene.py", line 56, in scene
    elements[ElementNames.LandTiles] = LandTiles(device, caves, on_the_fly_asset_folder, reused_asset_folder)
  File "/home/jj323/anaconda3/envs/infinigen/lib/python3.10/site-packages/gin/config.py", line 1605, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/home/jj323/anaconda3/envs/infinigen/lib/python3.10/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/home/jj323/anaconda3/envs/infinigen/lib/python3.10/site-packages/gin/config.py", line 1582, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/home/jj323/PycharmProjects/infinigen/infinigen/terrain/elements/landtiles.py", line 97, in __init__
    n_instances, tile_size, N, float_data = self.load_assets()
  File "/home/jj323/anaconda3/envs/infinigen/lib/python3.10/site-packages/gin/config.py", line 1605, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/home/jj323/anaconda3/envs/infinigen/lib/python3.10/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/home/jj323/anaconda3/envs/infinigen/lib/python3.10/site-packages/gin/config.py", line 1582, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/home/jj323/PycharmProjects/infinigen/infinigen/terrain/elements/landtiles.py", line 130, in load_assets
    landtile_asset(self.on_the_fly_asset_folder / tile / f"{i}", tile, device=self.device)
  File "/home/jj323/PycharmProjects/infinigen/infinigen/terrain/assets/landtiles/core.py", line 138, in landtile_asset
    multi_mountains_asset(folder, tile_size, resolution, device)
  File "/home/jj323/anaconda3/envs/infinigen/lib/python3.10/site-packages/gin/config.py", line 1605, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/home/jj323/anaconda3/envs/infinigen/lib/python3.10/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/home/jj323/anaconda3/envs/infinigen/lib/python3.10/site-packages/gin/config.py", line 1582, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/home/jj323/PycharmProjects/infinigen/infinigen/terrain/assets/landtiles/custom.py", line 153, in multi_mountains_asset
    if erosion: run_erosion(folder)
  File "/home/jj323/anaconda3/envs/infinigen/lib/python3.10/site-packages/gin/config.py", line 1605, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/home/jj323/anaconda3/envs/infinigen/lib/python3.10/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/home/jj323/anaconda3/envs/infinigen/lib/python3.10/site-packages/gin/config.py", line 1582, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/home/jj323/PycharmProjects/infinigen/infinigen/terrain/land_process/erosion.py", line 31, in run_erosion
    dll = load_cdll("terrain/lib/cpu/soil_machine/SoilMachine.so")
  File "/home/jj323/PycharmProjects/infinigen/infinigen/terrain/utils/ctype_util.py", line 29, in load_cdll
    return CDLL(root/path, mode=RTLD_LOCAL)
  File "/home/jj323/anaconda3/envs/infinigen/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.29' not found (required by /home/jj323/PycharmProjects/infinigen/infinigen/terrain/lib/cpu/soil_machine/SoilMachine.so)
  In call to configurable 'run_erosion' (<function run_erosion at 0x7f3362bc48b0>)
  In call to configurable 'multi_mountains_asset' (<function multi_mountains_asset at 0x7f335f96acb0>)
  In call to configurable 'load_assets' (<function LandTiles.load_assets at 0x7f335f952ef0>)
  In call to configurable 'LandTiles' (<class 'infinigen.terrain.elements.landtiles.LandTiles'>)
  In call to configurable 'scene' (<function scene at 0x7f336a7c8820>)
  In call to configurable 'Terrain' (<class 'infinigen.terrain.core.Terrain'>)
  In call to configurable 'compose_scene' (<function compose_scene at 0x7f343110ab00>)
  In call to configurable 'execute_tasks' (<function execute_tasks at 0x7f335f499bd0>)

The output of strings /usr/lib/x86_64-linux-gnu/libstdc++.so.6 | grep GLIBCXX is

GLIBCXX_3.4
GLIBCXX_3.4.1
GLIBCXX_3.4.2
GLIBCXX_3.4.3
GLIBCXX_3.4.4
GLIBCXX_3.4.5
GLIBCXX_3.4.6
GLIBCXX_3.4.7
GLIBCXX_3.4.8
GLIBCXX_3.4.9
GLIBCXX_3.4.10
GLIBCXX_3.4.11
GLIBCXX_3.4.12
GLIBCXX_3.4.13
GLIBCXX_3.4.14
GLIBCXX_3.4.15
GLIBCXX_3.4.16
GLIBCXX_3.4.17
GLIBCXX_3.4.18
GLIBCXX_3.4.19
GLIBCXX_3.4.20
GLIBCXX_3.4.21
GLIBCXX_3.4.22
GLIBCXX_3.4.23
GLIBCXX_3.4.24
GLIBCXX_3.4.25
GLIBCXX_3.4.26
GLIBCXX_3.4.27
GLIBCXX_3.4.28
GLIBCXX_DEBUG_MESSAGE_LENGTH
TomTomTommi commented 2 months ago

link