4090 training compatibility on MapTr_v2

alfredgu001324 commented 1 year ago

Hi, thanks for you great work.I just started to explore MapTr2 and when I started training, I met this error:

2023-09-19 17:19:58,978 - mmdet - INFO - workflow: [('train', 1)], max: 24 epochs 2023-09-19 17:19:58,979 - mmdet - INFO - Checkpoints will be saved to /home/guxunjia/project/MapTR_v2/work_dirs/maptrv2_nusc_r50_24ep_w_centerline by HardDiskBackend. /home/guxunjia/project/MapTR_v2/projects/mmdet3d_plugin/models/utils/grid_mask.py:114: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:180.) mask = torch.from_numpy(mask).to(x.dtype).cuda() Traceback (most recent call last): File "tools/train.py", line 259, in main() File "tools/train.py", line 248, in main custom_train_model( File "/home/guxunjia/project/MapTR_v2/projects/mmdet3d_plugin/bevformer/apis/train.py", line 27, in custom_train_model custom_train_detector( File "/home/guxunjia/project/MapTR_v2/projects/mmdet3d_plugin/bevformer/apis/mmdet_train.py", line 199, in custom_train_detector runner.run(data_loaders, cfg.workflow) File "/home/guxunjia/anaconda3/envs/maptr_v2/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run epoch_runner(data_loaders[i], kwargs) File "/home/guxunjia/anaconda3/envs/maptr_v2/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train self.run_iter(data_batch, train_mode=True, kwargs) File "/home/guxunjia/anaconda3/envs/maptr_v2/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 29, in run_iter outputs = self.model.train_step(data_batch, self.optimizer, File "/home/guxunjia/anaconda3/envs/maptr_v2/lib/python3.8/site-packages/mmcv/parallel/data_parallel.py", line 75, in train_step return self.module.train_step(inputs[0], kwargs[0]) File "/home/guxunjia/anaconda3/envs/maptr_v2/lib/python3.8/site-packages/mmdet/models/detectors/base.py", line 237, in train_step losses = self(data) File "/home/guxunjia/anaconda3/envs/maptr_v2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, kwargs) File "/home/guxunjia/project/MapTR_v2/projects/mmdet3d_plugin/maptr/detectors/maptrv2.py", line 197, in forward return self.forward_train(kwargs) File "/home/guxunjia/anaconda3/envs/maptr_v2/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 214, in new_func output = old_func(*new_args, new_kwargs) File "/home/guxunjia/project/MapTR_v2/projects/mmdet3d_plugin/maptr/detectors/maptrv2.py", line 315, in forward_train losses_pts = self.forward_pts_train(img_feats, lidar_feat, gt_bboxes_3d, File "/home/guxunjia/project/MapTR_v2/projects/mmdet3d_plugin/maptr/detectors/maptrv2.py", line 145, in forward_pts_train outs = self.pts_bbox_head( File "/home/guxunjia/anaconda3/envs/maptr_v2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, *kwargs) File "/home/guxunjia/anaconda3/envs/maptr_v2/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 214, in new_func output = old_func(new_args, new_kwargs) File "/home/guxunjia/project/MapTR_v2/projects/mmdet3d_plugin/maptr/dense_heads/maptrv2_head.py", line 345, in forward outputs = self.transformer( File "/home/guxunjia/anaconda3/envs/maptr_v2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, kwargs) File "/home/guxunjia/project/MapTR_v2/projects/mmdet3d_plugin/maptr/modules/transformer.py", line 365, in forward ouput_dic = self.get_bev_features( File "/home/guxunjia/project/MapTR_v2/projects/mmdet3d_plugin/maptr/modules/transformer.py", line 268, in get_bev_features ret_dict = self.lss_bev_encode( File "/home/guxunjia/project/MapTR_v2/projects/mmdet3d_plugin/maptr/modules/transformer.py", line 230, in lss_bev_encode encoder_outputdict = self.encoder(images,img_metas) File "/home/guxunjia/anaconda3/envs/maptr_v2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, *kwargs) File "/home/guxunjia/project/MapTR_v2/projects/mmdet3d_plugin/maptr/modules/encoder.py", line 1110, in forward x, depth = super().forward(images, img_metas) File "/home/guxunjia/anaconda3/envs/maptr_v2/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 214, in new_func output = old_func(new_args, new_kwargs) File "/home/guxunjia/project/MapTR_v2/projects/mmdet3d_plugin/maptr/modules/encoder.py", line 282, in forward geom = self.get_geometry_v1( File "/home/guxunjia/anaconda3/envs/maptr_v2/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 214, in new_func output = old_func(*new_args, **new_kwargs) File "/home/guxunjia/project/MapTR_v2/projects/mmdet3d_plugin/maptr/modules/encoder.py", line 115, in get_geometry_v1 torch.inverse(post_rots) RuntimeError: CUDA error: operation not supported when calling cusparseCreate(handle)

I have been using an RTX4090 on MapTR in the past weeks or so, and everything is fine. But when it comes to the environment setup of MapTR_v2, there is a problem. I am thinking that the new mmdetection folder in MapTR_v2 causes some incompatibility.

And I think this problem is specific to 4090 (https://github.com/facebookresearch/pytorch3d/issues/1399)

Can you dig into to this a little bit to make it compatible with RTX4090? Thanks!

alfredgu001324 commented 1 year ago

An ugly workaround is to set every torch inverse to follows:

torch.inverse(lidar2ego_rots.to("cpu")).to("cuda:0")

hctian713 commented 9 months ago

An ugly workaround is to set every torch inverse to follows:

torch.inverse(lidar2ego_rots.to("cpu")).to("cuda:0")

useful!

hustvl / MapTR

4090 training compatibility on MapTr_v2 #111