youngskkim / CRN

[ICCV'23] Official implementation of CRN: Camera Radar Net for Accurate, Robust, Efficient 3D Perception
MIT License
99 stars 15 forks source link

CUDA error: operation not supported when calling `cusparseCreate(handle)` on 4090 GPU #12

Open forever-free opened 4 months ago

forever-free commented 4 months ago

I run python exps/det/CRN_r18_256x704_128x128_4key.py --amp_backend native -b 1 --gpus 1 and have below errors

/home/iot/miniconda3/envs/CRN/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:240: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers argument(try 40 which is the number of cpus on this machine) in theDataLoaderinit to improve performance. rank_zero_warn( Epoch 0: 0%| | 0/29634 [00:00<?, ?it/s]Traceback (most recent call last): File "exps/det/CRN_r18_256x704_128x128_4key.py", line 359, in <module> run_cli(CRNLightningModel, File "/work/data06/git_/CRN/exps/base_cli.py", line 85, in run_cli trainer.fit(model) File "/home/iot/miniconda3/envs/CRN/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 771, in fit self._call_and_handle_interrupt( File "/home/iot/miniconda3/envs/CRN/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 722, in _call_and_handle_interrupt return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, **kwargs) File "/home/iot/miniconda3/envs/CRN/lib/python3.8/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 93, in launch return function(*args, **kwargs) File "/home/iot/miniconda3/envs/CRN/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 812, in _fit_impl results = self._run(model, ckpt_path=self.ckpt_path) File "/home/iot/miniconda3/envs/CRN/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1237, in _run results = self._run_stage() File "/home/iot/miniconda3/envs/CRN/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1324, in _run_stage return self._run_train() File "/home/iot/miniconda3/envs/CRN/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1354, in _run_train self.fit_loop.run() File "/home/iot/miniconda3/envs/CRN/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 204, in run self.advance(*args, **kwargs) File "/home/iot/miniconda3/envs/CRN/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 269, in advance self._outputs = self.epoch_loop.run(self._data_fetcher) File "/home/iot/miniconda3/envs/CRN/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 204, in run self.advance(*args, **kwargs) File "/home/iot/miniconda3/envs/CRN/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 208, in advance batch_output = self.batch_loop.run(batch, batch_idx) File "/home/iot/miniconda3/envs/CRN/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 204, in run self.advance(*args, **kwargs) File "/home/iot/miniconda3/envs/CRN/lib/python3.8/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 88, in advance outputs = self.optimizer_loop.run(split_batch, optimizers, batch_idx) File "/home/iot/miniconda3/envs/CRN/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 204, in run self.advance(*args, **kwargs) File "/home/iot/miniconda3/envs/CRN/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 203, in advance result = self._run_optimization( File "/home/iot/miniconda3/envs/CRN/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 256, in _run_optimization self._optimizer_step(optimizer, opt_idx, batch_idx, closure) File "/home/iot/miniconda3/envs/CRN/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 369, in _optimizer_step self.trainer._call_lightning_module_hook( File "/home/iot/miniconda3/envs/CRN/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1596, in _call_lightning_module_hook output = fn(*args, **kwargs) File "/home/iot/miniconda3/envs/CRN/lib/python3.8/site-packages/pytorch_lightning/core/lightning.py", line 1625, in optimizer_step optimizer.step(closure=optimizer_closure) File "/home/iot/miniconda3/envs/CRN/lib/python3.8/site-packages/pytorch_lightning/core/optimizer.py", line 168, in step step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs) File "/home/iot/miniconda3/envs/CRN/lib/python3.8/site-packages/pytorch_lightning/strategies/ddp.py", line 278, in optimizer_step optimizer_output = super().optimizer_step(optimizer, opt_idx, closure, model, **kwargs) File "/home/iot/miniconda3/envs/CRN/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 193, in optimizer_step return self.precision_plugin.optimizer_step(model, optimizer, opt_idx, closure, **kwargs) File "/home/iot/miniconda3/envs/CRN/lib/python3.8/site-packages/pytorch_lightning/plugins/precision/native_amp.py", line 85, in optimizer_step closure_result = closure() File "/home/iot/miniconda3/envs/CRN/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 148, in __call__ self._result = self.closure(*args, **kwargs) File "/home/iot/miniconda3/envs/CRN/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 134, in closure step_output = self._step_fn() File "/home/iot/miniconda3/envs/CRN/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 427, in _training_step training_step_output = self.trainer._call_strategy_hook("training_step", *step_kwargs.values()) File "/home/iot/miniconda3/envs/CRN/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1766, in _call_strategy_hook output = fn(*args, **kwargs) File "/home/iot/miniconda3/envs/CRN/lib/python3.8/site-packages/pytorch_lightning/strategies/ddp.py", line 344, in training_step return self.model(*args, **kwargs) File "/home/iot/miniconda3/envs/CRN/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/iot/miniconda3/envs/CRN/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 799, in forward output = self.module(*inputs[0], **kwargs[0]) File "/home/iot/miniconda3/envs/CRN/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/iot/miniconda3/envs/CRN/lib/python3.8/site-packages/pytorch_lightning/overrides/base.py", line 82, in forward output = self.module.training_step(*inputs, **kwargs) File "exps/det/CRN_r18_256x704_128x128_4key.py", line 298, in training_step preds, depth_preds = self(sweep_imgs, mats, File "/home/iot/miniconda3/envs/CRN/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "exps/det/CRN_r18_256x704_128x128_4key.py", line 281, in forward return self.model(sweep_imgs, mats, sweep_ptss=inputs['pts_pv'], is_train=is_train) File "/home/iot/miniconda3/envs/CRN/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/work/data06/git_/CRN/models/camera_radar_net_det.py", line 90, in forward feats, depth, _ = self.backbone_img(sweep_imgs, File "/home/iot/miniconda3/envs/CRN/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/work/data06/git_/CRN/layers/backbones/rvt_lss_fpn.py", line 397, in forward key_frame_res = self._forward_single_sweep( File "/work/data06/git_/CRN/layers/backbones/rvt_lss_fpn.py", line 304, in _forward_single_sweep geom_xyz, geom_xyz_valid = self.get_geometry_collapsed( File "/work/data06/git_/CRN/layers/backbones/rvt_lss_fpn.py", line 195, in get_geometry_collapsed points = ida_mat.inverse().matmul(points.unsqueeze(-1)).double() RuntimeError: CUDA error: operation not supported when callingcusparseCreate(handle)` Epoch 0: 0%| | 0/29634 [00:11<?, ?it/s]

hcy-hust commented 3 months ago

Hello, have you solved the problem yet?

chloexiangyy commented 2 months ago

Try change these codes in base_lss_fpn.py and rvt_lss_fpn.py .

before: points = ida_mat.inverse().matmul(points.unsqueeze(-1)).double() after: points = (ida_mat.cpu().inverse().cuda()).matmul(points.unsqueeze(-1)).double()

before: combine = sensor2ego_mat.matmul(torch.inverse(intrin_mat)).double() after: combine = sensor2ego_mat.matmul(torch.inverse(intrin_mat.cpu()).cuda()).double()

youngskkim commented 1 month ago

I didn't face the same error when I used 3090 GPU, so I am not sure, but I think that error is related to the data type (half, float, double).