ir413 / mvp

Masked Visual Pre-training for Robotics
213 stars 25 forks source link

72956 segmentation fault #17

Open zichunxx opened 6 months ago

zichunxx commented 6 months ago

Hi! Thanks for your great sharing!

I met the 72956 segmentation fault when I tried to train the task with Pixels suffix like FrankaPickPixels.

Besides, I have finished the training successfully with the task without the Pixels suffix. It seems that the segmentation fault is not triggered by pytorch.

I'm using Isaac Gym Preview 4 on Ubuntu 20.04.

Here is the output after running python tools/train_ppo.py task=FrankaPickPixels

Importing module 'gym_37' (/home/xzc/Downloads/IsaacGym_Preview_4_Package/isaacgym/python/isaacgym/_bindings/linux-x86_64/gym_37.so)
Setting GYM_USD_PLUG_INFO_PATH to /home/xzc/Downloads/IsaacGym_Preview_4_Package/isaacgym/python/isaacgym/_bindings/linux-x86_64/usd/plugInfo.json
PyTorch version 1.10.0
Device count 1
/home/xzc/Downloads/IsaacGym_Preview_4_Package/isaacgym/python/isaacgym/_bindings/src/gymtorch
Using /home/xzc/.cache/torch_extensions/py37_cu113 as PyTorch extensions root...
Emitting ninja build file /home/xzc/.cache/torch_extensions/py37_cu113/gymtorch/build.ninja...
Building extension module gymtorch...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module gymtorch...
/home/xzc/Downloads/IsaacGym_Preview_4_Package/isaacgym/python/isaacgym/torch_utils.py:135: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  def get_axis_params(value, axis_idx, x_value=0., dtype=np.float, n_dims=3):
tools/train_ppo.py:13: UserWarning: 
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
  @hydra.main(config_name="config", config_path="../configs/ppo")
/home/xzc/mambaforge/envs/mvp/lib/python3.7/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'config': Defaults list is missing `_self_`. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/default_composition_order for more information
  warnings.warn(msg, UserWarning)
/home/xzc/mambaforge/envs/mvp/lib/python3.7/site-packages/hydra/_internal/defaults_list.py:415: UserWarning: In config: Invalid overriding of hydra/job_logging:
Default list overrides requires 'override' keyword.
See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/defaults_list_override for more information.

  deprecation_warning(msg)
/home/xzc/mambaforge/envs/mvp/lib/python3.7/site-packages/hydra/_internal/hydra.py:127: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  configure_logging=with_log_configuration,
task: 
    name: FrankaPick
    env: 
        numEnvs: 256
        envSpacing: 1.5
        episodeLength: 500
        object_pos_init: [0.5, 0.0]
        object_pos_delta: [0.1, 0.2]
        goal_height: 0.8
        obs_type: pixels
        im_size: 224
        cam: 
            w: 298
            h: 224
            fov: 120
            ss: 2
            loc_p: [0.04, 0.0, 0.045]
            loc_r: [180, -90.0, 0.0]
        dofVelocityScale: 0.1
        actionScale: 7.5
        objectDistRewardScale: 0.08
        liftBonusRewardScale: 4.0
        goalDistRewardScale: 1.28
        goalBonusRewardScale: 4.0
        actionPenaltyScale: 0.01
        asset: 
            assetRoot: assets
            assetFileNameFranka: urdf/franka_description/robots/franka_panda.urdf
    sim: 
        substeps: 1
        physx: 
            num_threads: 4
            solver_type: 1
            num_position_iterations: 12
            num_velocity_iterations: 1
            contact_offset: 0.005
            rest_offset: 0.0
            bounce_threshold_velocity: 0.2
            max_depenetration_velocity: 1000.0
            default_buffer_size_multiplier: 5.0
            always_use_articulations: False
    task: 
        randomize: False
train: 
    seed: 0
    torch_deterministic: False
    encoder: 
        name: vits-mae-hoi
        pretrain_dir: /home/xzc/Documents/mvp/tmp/pretrained
        freeze: True
        emb_dim: 128
    policy: 
        pi_hid_sizes: [256, 128, 64]
        vf_hid_sizes: [256, 128, 64]
    learn: 
        agent_name: franka_ppo
        test: False
        resume: 0
        save_interval: 50
        print_log: True
        max_iterations: 2000
        cliprange: 0.1
        ent_coef: 0
        nsteps: 32
        noptepochs: 10
        nminibatches: 4
        max_grad_norm: 1
        optim_stepsize: 0.001
        schedule: cos
        gamma: 0.99
        lam: 0.95
        init_noise_std: 1.0
        log_interval: 1
physics_engine: physx
pipeline: gpu
sim_device: cuda:0
rl_device: cuda:0
graphics_device_id: 0
num_gpus: 1
test: False
resume: 0
logdir: /home/xzc/Documents/mvp/tmp/debug
cptdir: 
headless: True
Wrote config to: /home/xzc/Documents/mvp/tmp/debug/config.yaml
Setting seed: 0
Setting sim options
Not connected to PVD
+++ Using GPU PhysX
Physics Engine: PhysX
Physics Device: cuda:0
GPU Pipeline: enabled
num franka bodies:  11
num franka dofs:  9
[1]    72956 segmentation fault  python tools/train_ppo.py task=FrankaPickPixels

Looking forward to any comments! Thanks!

Virlus commented 4 weeks ago

Exactly the same issue. I have tried solutions from nvidia forum but they didn't work. Besides, I have tried to change the config as follows:

physics_engine: "physx"
pipeline: "gpu"
sim_device: "cuda:1"
rl_device: "cuda:1"
graphics_device_id: 1
num_gpus: 1

And another error occurs:

[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on rgbImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on depthImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on segmentationImage buffer with error 101
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on optical flow buffer with error 101
*** Can't create empty tensor

Hence I referred to this post for solution. But it fails as well. Hopefully the authors can come up with a workaround. Thanks!