I ran into a bug while reproducing your work

USTBhyh commented 5 months ago

Hello, I encountered a bug in the process of reproducing your work, the bug is as follows, I think there is nothing wrong with the configuration of my environment, but this problem has occurred in the process of my repeated reproduction, may I ask what is the cause of this. Traceback (most recent call last): File "opencood/tools/train.py", line 189, in <module> main() File "opencood/tools/train.py", line 118, in main ouput_dict = model(batch_data['ego']) File "/space0/houyh/miniconda3/envs/heal/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/space0/houyh/workspace/HEAL/opencood/models/heter_pyramid_collab.py", line 145, in forward feature = eval(f"self.encoder_{modality_name}")(data_dict, modality_name) File "/space0/houyh/miniconda3/envs/heal/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/space0/houyh/workspace/HEAL/opencood/models/heter_encoders.py", line 47, in forward batch_dict = self.pillar_vfe(batch_dict) File "/space0/houyh/miniconda3/envs/heal/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/space0/houyh/workspace/HEAL/opencood/models/sub_modules/pillar_vfe.py", line 151, in forward features = pfn(features) File "/space0/houyh/miniconda3/envs/heal/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/space0/houyh/workspace/HEAL/opencood/models/sub_modules/pillar_vfe.py", line 35, in forward part_linear_out = [self.linear( File "/space0/houyh/workspace/HEAL/opencood/models/sub_modules/pillar_vfe.py", line 35, in <listcomp> part_linear_out = [self.linear( File "/space0/houyh/miniconda3/envs/heal/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/space0/houyh/miniconda3/envs/heal/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 103, in forward return F.linear(input, self.weight, self.bias) File "/space0/houyh/miniconda3/envs/heal/lib/python3.8/site-packages/torch/nn/functional.py", line 1848, in linear return torch._C._nn.linear(input, weight, bias) RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when callingcublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)``

yifanlu0227 commented 5 months ago

I never encounter this bug. Can you try some other yamls？Do they reproduce the same error？

USTBhyh commented 5 months ago

Well, thanks,, I've tried almost all yaml of LidarOnly under dair and v2v datasets and I'll have this issue. Probably an issue with my environment and I'm trying to fix it.

I never encounter this bug. Can you try some other yamls？Do they reproduce the same error？

yifanlu0227 / HEAL

I ran into a bug while reproducing your work #10