isaac-sim / OmniIsaacGymEnvs

Reinforcement Learning Environments for Omniverse Isaac Gym
Other
762 stars 203 forks source link

how to change numEnvs? #162

Open xuyisen-x opened 1 month ago

xuyisen-x commented 1 month ago

In task AnymalTerrain, I tried to change the numEnvs from 2048 to 4096, by running the following command:

python scripts/rlgames_train.py task=AnymalTerrain headless=True num_threads=32 num_envs=4096 max_iterations=10 task.sim.physx.gpu_found_lost_pairs_capacity=5194304 task.sim.physx.gpu_found_lost_aggregate_pairs_capacity=71331648 task.sim.physx.gpu_total_aggregate_pairs_capacity=5194304

Then I got Fatal Python error: Segmentation fault at the end of the trainning.

Fatal Python error: Segmentation fault

Thread 0x0000725ec0960640 (most recent call first):
  File "/home/xuyisen/miniconda3/envs/isaac-sim/lib/python3.10/threading.py", line 320 in wait
  File "/home/xuyisen/miniconda3/envs/isaac-sim/lib/python3.10/multiprocessing/queues.py", line 231 in _feed
  File "/home/xuyisen/miniconda3/envs/isaac-sim/lib/python3.10/threading.py", line 953 in run
  File "/home/xuyisen/miniconda3/envs/isaac-sim/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/home/xuyisen/miniconda3/envs/isaac-sim/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x0000725ec8fff640 (most recent call first):
  <no Python frame>

Thread 0x0000725ed0960640 (most recent call first):
  File "/home/xuyisen/miniconda3/envs/isaac-sim/lib/python3.10/selectors.py", line 416 in select
  File "/home/xuyisen/miniconda3/envs/isaac-sim/lib/python3.10/multiprocessing/connection.py", line 931 in wait
  File "/home/xuyisen/miniconda3/envs/isaac-sim/lib/python3.10/multiprocessing/connection.py", line 424 in _poll
  File "/home/xuyisen/miniconda3/envs/isaac-sim/lib/python3.10/multiprocessing/connection.py", line 257 in poll
  File "/home/xuyisen/miniconda3/envs/isaac-sim/lib/python3.10/multiprocessing/queues.py", line 113 in get
  File "/home/xuyisen/.local/share/ov/pkg/isaac_sim-2023.1.1/kit/python/lib/python3.10/site-packages/tensorboardX/event_file_writer.py", line 202 in run
  File "/home/xuyisen/miniconda3/envs/isaac-sim/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/home/xuyisen/miniconda3/envs/isaac-sim/lib/python3.10/threading.py", line 973 in _bootstrap

Current thread 0x00007263d7e30740 (most recent call first):
  File "/home/xuyisen/.local/share/ov/pkg/isaac_sim-2023.1.1/exts/omni.isaac.core/omni/isaac/core/simulation_context/simulation_context.py", line 729 in render
  File "/home/xuyisen/.local/share/ov/pkg/isaac_sim-2023.1.1/exts/omni.isaac.core/omni/isaac/core/simulation_context/simulation_context.py", line 953 in stop
  File "/home/xuyisen/.local/share/ov/pkg/isaac_sim-2023.1.1/exts/omni.isaac.gym/omni/isaac/gym/vec_env/vec_env_base.py", line 188 in close
  File "/home/xuyisen/projects/OmniIsaacGymEnvs/omniisaacgymenvs/scripts/rlgames_train.py", line 167 in parse_hydra_configs
  File "/home/xuyisen/.local/share/ov/pkg/isaac_sim-2023.1.1/kit/python/lib/python3.10/site-packages/hydra/core/utils.py", line 186 in run_job
  File "/home/xuyisen/.local/share/ov/pkg/isaac_sim-2023.1.1/kit/python/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 119 in run
  File "/home/xuyisen/.local/share/ov/pkg/isaac_sim-2023.1.1/kit/python/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458 in <lambda>
  File "/home/xuyisen/.local/share/ov/pkg/isaac_sim-2023.1.1/kit/python/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220 in run_and_report
  File "/home/xuyisen/.local/share/ov/pkg/isaac_sim-2023.1.1/kit/python/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457 in _run_app
  File "/home/xuyisen/.local/share/ov/pkg/isaac_sim-2023.1.1/kit/python/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394 in _run_hydra
  File "/home/xuyisen/.local/share/ov/pkg/isaac_sim-2023.1.1/kit/python/lib/python3.10/site-packages/hydra/main.py", line 94 in decorated_main
  File "/home/xuyisen/projects/OmniIsaacGymEnvs/omniisaacgymenvs/scripts/rlgames_train.py", line 174 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, yaml._yaml, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, gmpy2.gmpy2, _brotli, google.protobuf.pyext._message, psutil._psutil_linux, psutil._psutil_posix, omni.mdl.pymdlsdk._pymdlsdk, pydantic.typing, pydantic.version, pydantic.utils, pydantic.color, pydantic.datetime_parse, pydantic.types, pydantic.error_wrappers, pydantic.parse, pydantic.annotated_types, pydantic.decorator, pydantic.tools, scipy._lib._ccallback_c, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.sparse.linalg._isolve._iterative, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg._cythonized_array_utils, scipy.linalg._flinalg, scipy.linalg._solve_toeplitz, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_lapack, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial.transform._rotation, scipy.interpolate._fitpack, scipy.interpolate.dfitpack, scipy.optimize._minpack2, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, numpy.linalg.lapack_lite, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize.__nnls, scipy.optimize._highs.cython.src._highs_wrapper, scipy.optimize._highs._highs_wrapper, scipy.optimize._highs.cython.src._highs_constants, scipy.optimize._highs._highs_constants, scipy.linalg._interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.optimize._direct, scipy.interpolate._bspl, scipy.interpolate._ppoly, scipy.interpolate.interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython (total: 102)

I must missed something, what should I do? Is there any other configs I must change?

eferreirafilho commented 1 week ago

Are you monitoring VRAM? Looks like you may have run out of VRAM for those settings