RuntimeError: nvrtc: error: failed to load builtins for compute_80

zichunxx commented 5 months ago

Hi! @mihdalal! Thanks again for checking this issue.

When I tried to train the robosuite lift task, I met the following error:

/home/xzc/miniforge3/envs/raps/lib/python3.7/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
  warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
/home/xzc/miniforge3/envs/raps/lib/python3.7/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
  warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
/home/xzc/miniforge3/envs/raps/lib/python3.7/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
  warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
/home/xzc/miniforge3/envs/raps/lib/python3.7/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
  warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
/home/xzc/miniforge3/envs/raps/lib/python3.7/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
  warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
/home/xzc/miniforge3/envs/raps/lib/python3.7/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
  warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
/home/xzc/miniforge3/envs/raps/lib/python3.7/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
  warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
/home/xzc/miniforge3/envs/raps/lib/python3.7/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
  warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
/home/xzc/miniforge3/envs/raps/lib/python3.7/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
  warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
/home/xzc/miniforge3/envs/raps/lib/python3.7/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
  warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
/home/xzc/miniforge3/envs/raps/lib/python3.7/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
  warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
TRAINING
Traceback (most recent call last):
  File "experiments/robosuite/dreamer/dreamer_v2_single_task_primitives_lift.py", line 182, in <module>
    exp_id=exp_id,
  File "/home/xzc/Documents/raps/rlkit/rlkit/launchers/launcher_util.py", line 590, in run_experiment
    return run_experiment_here(method_call, **run_experiment_kwargs)
  File "/home/xzc/Documents/raps/rlkit/rlkit/launchers/launcher_util.py", line 168, in run_experiment_here
    return experiment_function(variant)
  File "/home/xzc/Documents/raps/rlkit/rlkit/torch/model_based/dreamer/experiments/kitchen_dreamer.py", line 196, in experiment
    algorithm.train()
  File "/home/xzc/Documents/raps/rlkit/rlkit/torch/model_based/rl_algorithm.py", line 52, in train
    self._train()
  File "/home/xzc/Documents/raps/rlkit/rlkit/torch/model_based/rl_algorithm.py", line 277, in _train
    self.num_eval_steps_per_epoch,
  File "/home/xzc/Documents/raps/rlkit/rlkit/torch/model_based/dreamer/path_collector.py", line 54, in collect_new_paths
    render_kwargs=self._render_kwargs,
  File "/home/xzc/Documents/raps/rlkit/rlkit/torch/model_based/dreamer/rollout_functions.py", line 63, in vec_rollout
    a, agent_info = agent.get_action(o_for_agent, **get_action_kwargs)
  File "/home/xzc/miniforge3/envs/raps/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
    return func(*args, **kwargs)
  File "/home/xzc/Documents/raps/rlkit/rlkit/torch/model_based/dreamer/dreamer_policy.py", line 49, in get_action
    embed = self.world_model.encode(observation)
RuntimeError: nvrtc: error: failed to load builtins for compute_80.
nvrtc compilation failed: 

#define NAN __int_as_float(0x7fffffff)
#define POS_INFINITY __int_as_float(0x7f800000)
#define NEG_INFINITY __int_as_float(0xff800000)

template<typename T>
__device__ T maximum(T a, T b) {
  return isnan(a) ? a : (a > b ? a : b);
}

template<typename T>
__device__ T minimum(T a, T b) {
  return isnan(a) ? a : (a < b ? a : b);
}

extern "C" __global__
void func_1(float* t0, float* aten_sub_flat) {
{
  float v = __ldg(t0 + 512 * blockIdx.x + threadIdx.x);
  aten_sub_flat[512 * blockIdx.x + threadIdx.x] = v / 255.f - 0.5f;
}
}

It seems that this error is related to the cuda version?

I installed torch1.7.1+cu110 according to the requirements.txt with py3.7 on ubuntu 20.04. Besides, my cuda version is 11.2, and GPU (3070Ti) can be found with this torch version.

Is there a mismatch with my settings?

Kind thanks.

zichunxx commented 5 months ago

I tried to downgrade my Cuda version from 11.2 to 10.1 and install torch==1.7.1+cu101 torchvision==0.8.2+cu101 torchaudio==0.7.2. But this will trigger another compatible issue:

NVIDIA GeForce RTX 3070 Ti with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75.
If you want to use the NVIDIA GeForce RTX 3070 Ti GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

Would you mind giving some instructions about the torch version that can reproduce the training process?

mihdalal commented 5 months ago

I'm not entirely sure what is causing these errors but I would recommend looking this up on google, this seems like an issue with your cuda/torch setup, perhaps you need to try a more recent version/upgrade your drivers, etc.

zichunxx commented 5 months ago

@mihdalal Thanks for your kind reply! I'll try with some other cuda/pytorch versions.

I really like your idea. Just in case I don't succeed in running the code, I have some questions about the core of the article that I'd like to receive a reply from you. Hope I understand it right.

I spent some time browsing the code. In my opinion, the main innovation of your article is reflected in the act function in RobosuitePrimitives when the control mode is primitives and tasks are robosuite related.

Taking the 10 primitives in RobosuitePrimitives as an example, the RL algorithm will generate a distribution for an action space of size 20, where the first ten parameters represent the distribution over primitives and the last ten parameters represent the corresponding parameters of the primitives. The act function will select the primitive with the highest probability and pass the corresponding parameters to the controller.

Because of this, this wrapper can be adapted to different tasks without making extensive changes to the task itself or the RL algorithm. Is that right?

Thanks for your patience.

mihdalal commented 5 months ago

Yes that is correct!

mihdalal / raps

RuntimeError: nvrtc: error: failed to load builtins for compute_80 #13