isaac-sim / IsaacGymEnvs

Isaac Gym Reinforcement Learning Environments
Other
1.91k stars 411 forks source link

Factory environment on remote server, Segmentation fault (core dumped) #91

Open wey-code opened 1 year ago

wey-code commented 1 year ago

Thanks for the great job on IsaacGymEnvs! When I ran the demo 'python train.py task=FactoryTaskNutBoltScrew', I met 'Segmentation fault (core dumped)', The output is as follows:

Importing module 'gym_37' (/home/slc/env/isaacgym/python/isaacgym/_bindings/linux-x86_64/gym_37.so)
Setting GYM_USD_PLUG_INFO_PATH to /home/slc/env/isaacgym/python/isaacgym/_bindings/linux-x86_64/usd/plugInfo.json
Warning: Gym version v0.24.0 has a number of critical issues with `gym.make` such that the `reset` and `step` functions are called before returning the environment. It is recommend to downgrading to v0.23.1 or upgrading to v0.25.1
train.py:49: UserWarning:
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
  @hydra.main(config_name="config", config_path="./cfg")
/home/slc/miniconda3/envs/rlgpu/lib/python3.7/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'config': Defaults list is missing `_self_`. See https://hydra.cc/docs/upgrades/1.0_to_1.1/default_composition_order for more information
  warnings.warn(msg, UserWarning)
/home/slc/miniconda3/envs/rlgpu/lib/python3.7/site-packages/hydra/_internal/defaults_list.py:415: UserWarning: In config: Invalid overriding of hydra/job_logging:
Default list overrides requires 'override' keyword.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/defaults_list_override for more information.

  deprecation_warning(msg)
/home/slc/miniconda3/envs/rlgpu/lib/python3.7/site-packages/hydra/_internal/hydra.py:127: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/next/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  configure_logging=with_log_configuration,
/home/slc/miniconda3/envs/rlgpu/lib/python3.7/site-packages/torch/utils/cpp_extension.py:3: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
PyTorch version 1.8.1
Device count 2
/home/slc/env/isaacgym/python/isaacgym/_bindings/src/gymtorch
Using /home/slc/.cache/torch_extensions as PyTorch extensions root...
Emitting ninja build file /home/slc/.cache/torch_extensions/gymtorch/build.ninja...
Building extension module gymtorch...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module gymtorch...
/home/slc/env/isaacgym/python/isaacgym/torch_utils.py:135: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  def get_axis_params(value, axis_idx, x_value=0., dtype=np.float, n_dims=3):
2022-11-04 16:54:09,740 - INFO - logger - logger initialized
<unknown>:6: DeprecationWarning: invalid escape sequence \*
Error: FBX library failed to load - importing FBX data will not succeed. Message: No module named 'fbx'
FBX tools must be installed from https://help.autodesk.com/view/FBX/2020/ENU/?guid=FBX_Developer_Help_scripting_with_python_fbx_installing_python_fbx_html
/home/slc/miniconda3/envs/rlgpu/lib/python3.7/site-packages/torch/utils/tensorboard/__init__.py:3: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  if not hasattr(tensorboard, '__version__') or LooseVersion(tensorboard.__version__) < LooseVersion('1.15'):
task:
    name: FactoryTaskNutBoltScrew
    physics_engine: physx
    sim:
        use_gpu_pipeline: True
        up_axis: z
        dt: 0.016667
        gravity: [0.0, 0.0, -9.81]
        disable_gravity: False
    env:
        numEnvs: 128
        numObservations: 32
        numActions: 12
    randomize:
        franka_arm_initial_dof_pos: [0.0015178, -0.19651, -0.0014364, -1.9761, -0.00027717, 1.7796, 0.78556]
        nut_rot_initial: 30.0
    rl:
        pos_action_scale: [0.1, 0.1, 0.1]
        rot_action_scale: [0.1, 0.1, 0.1]
        force_action_scale: [1.0, 1.0, 1.0]
        torque_action_scale: [1.0, 1.0, 1.0]
        unidirectional_rot: True
        unidirectional_force: False
        clamp_rot: True
        clamp_rot_thresh: 1e-06
        add_obs_finger_force: False
        keypoint_reward_scale: 1.0
        action_penalty_scale: 0.0
        max_episode_length: 4096
        far_error_thresh: 0.1
        success_bonus: 0.0
    ctrl:
        ctrl_type: operational_space_motion
        all:
            jacobian_type: geometric
            gripper_prop_gains: [100, 100]
            gripper_deriv_gains: [1, 1]
        gym_default:
            ik_method: dls
            joint_prop_gains: [40, 40, 40, 40, 40, 40, 40]
            joint_deriv_gains: [8, 8, 8, 8, 8, 8, 8]
            gripper_prop_gains: [500, 500]
            gripper_deriv_gains: [20, 20]
        joint_space_ik:
            ik_method: dls
            joint_prop_gains: [1, 1, 1, 1, 1, 1, 1]
            joint_deriv_gains: [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]
        joint_space_id:
            ik_method: dls
            joint_prop_gains: [40, 40, 40, 40, 40, 40, 40]
            joint_deriv_gains: [8, 8, 8, 8, 8, 8, 8]
        task_space_impedance:
            motion_ctrl_axes: [1, 1, 1, 1, 1, 1]
            task_prop_gains: [40, 40, 40, 40, 40, 40]
            task_deriv_gains: [8, 8, 8, 8, 8, 8]
        operational_space_motion:
            motion_ctrl_axes: [0, 0, 1, 0, 0, 1]
            task_prop_gains: [1, 1, 1, 1, 1, 100]
            task_deriv_gains: [1, 1, 1, 1, 1, 1]
        open_loop_force:
            force_ctrl_axes: [0, 0, 1, 0, 0, 0]
        closed_loop_force:
            force_ctrl_axes: [0, 0, 1, 0, 0, 0]
            wrench_prop_gains: [0.1, 0.1, 0.1, 0.1, 0.1, 0.1]
        hybrid_force_motion:
            motion_ctrl_axes: [1, 1, 0, 1, 1, 1]
            task_prop_gains: [40, 40, 40, 40, 40, 40]
            task_deriv_gains: [8, 8, 8, 8, 8, 8]
            force_ctrl_axes: [0, 0, 1, 0, 0, 0]
            wrench_prop_gains: [0.1, 0.1, 0.1, 0.1, 0.1, 0.1]
train:
    params:
        seed: 42
        algo:
            name: a2c_continuous
        model:
            name: continuous_a2c_logstd
        network:
            name: actor_critic
            separate: False
            space:
                continuous:
                    mu_activation: None
                    sigma_activation: None
                    mu_init:
                        name: default
                    sigma_init:
                        name: const_initializer
                        val: 0
                    fixed_sigma: True
            mlp:
                units: [256, 128, 64]
                activation: elu
                d2rl: False
                initializer:
                    name: default
                regularizer:
                    name: None
        load_checkpoint: True
        load_path: /home/slc/env/IsaacGymEnvs-main/isaacgymenvs/runs/FactoryTaskNutBoltScrew/nn/FactoryTaskNutBoltScrew.pth
        config:
            name: FactoryTaskNutBoltScrew
            full_experiment_name: FactoryTaskNutBoltScrew
            env_name: rlgpu
            multi_gpu: False
            ppo: True
            mixed_precision: True
            normalize_input: True
            normalize_value: True
            value_bootstrap: True
            num_actors: 128
            reward_shaper:
                scale_value: 1.0
            normalize_advantage: True
            gamma: 0.99
            tau: 0.95
            learning_rate: 0.0001
            lr_schedule: fixed
            schedule_type: standard
            kl_threshold: 0.016
            score_to_win: 20000
            max_epochs: 1024
            save_best_after: 50
            save_frequency: 100
            print_stats: True
            grad_norm: 1.0
            entropy_coef: 0.0
            truncate_grads: False
            e_clip: 0.2
            horizon_length: 32
            minibatch_size: 512
            mini_epochs: 8
            critic_coef: 2
            clip_value: True
            seq_len: 4
            bounds_loss_coef: 0.0001
            device: cuda:0
task_name: FactoryTaskNutBoltScrew
experiment:
num_envs:
seed: 42
torch_deterministic: False
max_iterations:
physics_engine: physx
pipeline: gpu
sim_device: cuda:0
rl_device: cuda:0
graphics_device_id: 0
num_threads: 4
solver_type: 1
num_subscenes: 4
test: False
checkpoint: /home/slc/env/IsaacGymEnvs-main/isaacgymenvs/runs/FactoryTaskNutBoltScrew/nn/FactoryTaskNutBoltScrew.pth
multi_gpu: False
wandb_activate: False
wandb_group:
wandb_name: FactoryTaskNutBoltScrew
wandb_entity:
wandb_project: isaacgymenvs
capture_video: False
capture_video_freq: 1464
capture_video_len: 100
force_render: True
headless: False
Setting seed: 42
self.seed = 42
Started to train
Exact experiment name requested from command line: FactoryTaskNutBoltScrew
/home/slc/miniconda3/envs/rlgpu/lib/python3.7/site-packages/gym/spaces/box.py:112: UserWarning: WARN: Box bound precision lowered by casting to float32
  logger.warn(f"Box bound precision lowered by casting to {self.dtype}")
Not connected to PVD
+++ Using GPU PhysX
Physics Engine: PhysX
Physics Device: cuda:0
GPU Pipeline: enabled
WARNING: lavapipe is not a conformant vulkan implementation, testing use only.
/home/slc/env/IsaacGymEnvs-main/isaacgymenvs/tasks/factory/factory_base.py:507: UserWarning: WARN: Please be patient: SDFs may be generating, which may take a few minutes. Terminating prematurely may result in a corrupted SDF cache.
  logger.warn('Please be patient: SDFs may be generating, which may take a few minutes. Terminating prematurely may result in a corrupted SDF cache.')
Using SDF cache directory '/home/slc/.isaacgym/sdf_V100'
~!~!~! Loaded/Cooked SDF triangle mesh 0 @ 0x55b23f95ae80, resolution=256, spacing=0.000108
  ~!~! Bounds:  (-0.012000, 0.012000) (-0.013856, 0.013856) (0.016000, 0.029000)
  ~!~! Extents: (0.024000, 0.027712, 0.013000)
  ~!~! Resolution: (222, 256, 121)
~!~!~! Loaded/Cooked SDF triangle mesh 1 @ 0x55b245831410, resolution=512, spacing=0.000080
  ~!~! Bounds:  (-0.012000, 0.012000) (-0.012000, 0.012000) (0.000000, 0.041000)
  ~!~! Extents: (0.024000, 0.024000, 0.041000)
  ~!~! Resolution: (300, 300, 512)
~!~!~! Loaded/Cooked SDF triangle mesh 2 @ 0x55b23e473a10, resolution=256, spacing=0.000108
  ~!~! Bounds:  (-0.012000, 0.012000) (-0.013856, 0.013856) (0.016000, 0.029000)
  ~!~! Extents: (0.024000, 0.027712, 0.013000)
  ~!~! Resolution: (222, 256, 121)
~!~!~! Loaded/Cooked SDF triangle mesh 3 @ 0x55b249372f10, resolution=512, spacing=0.000080
  ~!~! Bounds:  (-0.012000, 0.012000) (-0.012000, 0.012000) (0.000000, 0.041000)
  ~!~! Extents: (0.024000, 0.024000, 0.041000)
  ~!~! Resolution: (300, 300, 512)
Box(-1.0, 1.0, (12,), float32) Box(-inf, inf, (32,), float32)
current training device: cuda:0
build mlp: 32
RunningMeanStd:  (1,)
RunningMeanStd:  (32,)
=> loading checkpoint '/home/slc/env/IsaacGymEnvs-main/isaacgymenvs/runs/FactoryTaskNutBoltScrew/nn/FactoryTaskNutBoltScrew.pth'
/home/slc/env/IsaacGymEnvs-main/isaacgymenvs/tasks/factory/factory_control.py:145: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  task_wrench = task_wrench + torch.tensor(cfg_ctrl['motion_ctrl_axes'], device=device).unsqueeze(0) * task_wrench_motion
Unhandled descriptor set 433
Unhandled descriptor set 1176522960
Unhandled descriptor set 1177337120
Segmentation fault (core dumped)

If I use the 'headless=True', the error will disappear. I am aware that the remote server needs the graphic display, so I use X11. And I used the method in other demos like Ant and ShadowHand, It seems to work well. so I am confused why the error will appear in the factory environment. I tried the method like #22 , updating the Nvidia driver, but it doesn't work, either.

ChenyangRan commented 1 year ago

Hi, I have met the same error, have you solved?

wey-code commented 1 year ago

Hi, I have met the same error, have you solved?

not yet.