sustcsonglin / flash-linear-attention

Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton
MIT License
1.34k stars 69 forks source link

[Bug]: multi-GPU, TypeError: 'NoneType' object is not a mapping #66

Open n2729648074 opened 1 month ago

n2729648074 commented 1 month ago

Describe the bug

Thank you very much for your excellent work! When I train with multi-GPU, the autotuner.py function in triton pops up full_nargs = {self.nargs, kwargs, **self.best_config.kwargs} TypeError: 'NoneType' object is not a mapping error

but when i train with single-GPU, the error doesn't trigger. So, I'd like to ask you how to train in parallel with multi-GPU without errors

Steps to reproduce the bug

File "/home/nzx/dcase_task4_sed/code/Transformer4SED-main/fla/layers/gsa.py", line 140, in forward hidden_states = self.norm(hidden_states) File "/home/nzx/anaconda3/envs/dcase2024/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/home/nzx/anaconda3/envs/dcase2024/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "/home/nzx/dcase_task4_sed/code/Transformer4SED-main/fla/modules/layernorm.py", line 659, in forward return rms_norm_fn( File "/home/nzx/dcase_task4_sed/code/Transformer4SED-main/fla/modules/layernorm.py", line 526, in rms_norm_fn return LayerNormFn.apply( File "/home/nzx/anaconda3/envs/dcase2024/lib/python3.9/site-packages/torch/autograd/function.py", line 539, in apply return super().apply(args, kwargs) # type: ignore[misc] File "/home/nzx/dcase_task4_sed/code/Transformer4SED-main/fla/utils.py", line 12, in wrapper return fn(ctx, File "/home/nzx/dcase_task4_sed/code/Transformer4SED-main/fla/modules/layernorm.py", line 415, in forward y, mean, rstd, residual_out = _layer_norm_fwd( File "/home/nzx/dcase_task4_sed/code/Transformer4SED-main/fla/modules/layernorm.py", line 172, in _layer_norm_fwd _layer_norm_fwd_1pass_kernel[(M,)]( File "/home/nzx/anaconda3/envs/dcase2024/lib/python3.9/site-packages/triton/runtime/autotuner.py", line 143, in run timings = {config: self._bench(*args, config=config, *kwargs) for config in pruned_configs} File "/home/nzx/anaconda3/envs/dcase2024/lib/python3.9/site-packages/triton/runtime/autotuner.py", line 143, in timings = {config: self._bench(args, config=config, kwargs) for config in pruned_configs} File "/home/nzx/anaconda3/envs/dcase2024/lib/python3.9/site-packages/triton/runtime/autotuner.py", line 104, in _bench full_nargs = {self.nargs, **current} TypeError: 'NoneType' object is not a mapping

Expected behavior

I'd like to ask you how to train in parallel with multi-GPU without errors

Environment info

  1. torch:
  2. triton:
yzhangcs commented 1 month ago

@n2729648074 Have you found the problems? I do not have envs in hand reproducing the bugs :-(

cgz6498 commented 3 weeks ago

Has it been resolved?

yzhangcs commented 3 weeks ago

@cgz6498 Hi, can you reproduce the problems when running the example code in https://github.com/sustcsonglin/flash-linear-attention/tree/main/training