Thank you very much for your excellent work!
When I train with multi-GPU, the autotuner.py function in triton pops up full_nargs = {self.nargs, kwargs, **self.best_config.kwargs} TypeError: 'NoneType' object is not a mapping error
but when i train with single-GPU, the error doesn't trigger.
So, I'd like to ask you how to train in parallel with multi-GPU without errors
Steps to reproduce the bug
File "/home/nzx/dcase_task4_sed/code/Transformer4SED-main/fla/layers/gsa.py", line 140, in forward
hidden_states = self.norm(hidden_states)
File "/home/nzx/anaconda3/envs/dcase2024/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "/home/nzx/anaconda3/envs/dcase2024/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, *kwargs)
File "/home/nzx/dcase_task4_sed/code/Transformer4SED-main/fla/modules/layernorm.py", line 659, in forward
return rms_norm_fn(
File "/home/nzx/dcase_task4_sed/code/Transformer4SED-main/fla/modules/layernorm.py", line 526, in rms_norm_fn
return LayerNormFn.apply(
File "/home/nzx/anaconda3/envs/dcase2024/lib/python3.9/site-packages/torch/autograd/function.py", line 539, in apply
return super().apply(args, kwargs) # type: ignore[misc]
File "/home/nzx/dcase_task4_sed/code/Transformer4SED-main/fla/utils.py", line 12, in wrapper
return fn(ctx,
File "/home/nzx/dcase_task4_sed/code/Transformer4SED-main/fla/modules/layernorm.py", line 415, in forward
y, mean, rstd, residual_out = _layer_norm_fwd(
File "/home/nzx/dcase_task4_sed/code/Transformer4SED-main/fla/modules/layernorm.py", line 172, in _layer_norm_fwd
_layer_norm_fwd_1pass_kernel[(M,)](
File "/home/nzx/anaconda3/envs/dcase2024/lib/python3.9/site-packages/triton/runtime/autotuner.py", line 143, in run
timings = {config: self._bench(*args, config=config, *kwargs) for config in pruned_configs}
File "/home/nzx/anaconda3/envs/dcase2024/lib/python3.9/site-packages/triton/runtime/autotuner.py", line 143, in
timings = {config: self._bench(args, config=config, kwargs) for config in pruned_configs}
File "/home/nzx/anaconda3/envs/dcase2024/lib/python3.9/site-packages/triton/runtime/autotuner.py", line 104, in _bench
full_nargs = {self.nargs, **current}
TypeError: 'NoneType' object is not a mapping
Expected behavior
I'd like to ask you how to train in parallel with multi-GPU without errors
Describe the bug
Thank you very much for your excellent work! When I train with multi-GPU, the autotuner.py function in triton pops up full_nargs = {self.nargs, kwargs, **self.best_config.kwargs} TypeError: 'NoneType' object is not a mapping error
but when i train with single-GPU, the error doesn't trigger. So, I'd like to ask you how to train in parallel with multi-GPU without errors
Steps to reproduce the bug
File "/home/nzx/dcase_task4_sed/code/Transformer4SED-main/fla/layers/gsa.py", line 140, in forward hidden_states = self.norm(hidden_states) File "/home/nzx/anaconda3/envs/dcase2024/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/home/nzx/anaconda3/envs/dcase2024/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "/home/nzx/dcase_task4_sed/code/Transformer4SED-main/fla/modules/layernorm.py", line 659, in forward return rms_norm_fn( File "/home/nzx/dcase_task4_sed/code/Transformer4SED-main/fla/modules/layernorm.py", line 526, in rms_norm_fn return LayerNormFn.apply( File "/home/nzx/anaconda3/envs/dcase2024/lib/python3.9/site-packages/torch/autograd/function.py", line 539, in apply return super().apply(args, kwargs) # type: ignore[misc] File "/home/nzx/dcase_task4_sed/code/Transformer4SED-main/fla/utils.py", line 12, in wrapper return fn(ctx, File "/home/nzx/dcase_task4_sed/code/Transformer4SED-main/fla/modules/layernorm.py", line 415, in forward y, mean, rstd, residual_out = _layer_norm_fwd( File "/home/nzx/dcase_task4_sed/code/Transformer4SED-main/fla/modules/layernorm.py", line 172, in _layer_norm_fwd _layer_norm_fwd_1pass_kernel[(M,)]( File "/home/nzx/anaconda3/envs/dcase2024/lib/python3.9/site-packages/triton/runtime/autotuner.py", line 143, in run timings = {config: self._bench(*args, config=config, *kwargs) for config in pruned_configs} File "/home/nzx/anaconda3/envs/dcase2024/lib/python3.9/site-packages/triton/runtime/autotuner.py", line 143, in
timings = {config: self._bench( args, config=config, kwargs) for config in pruned_configs}
File "/home/nzx/anaconda3/envs/dcase2024/lib/python3.9/site-packages/triton/runtime/autotuner.py", line 104, in _bench
full_nargs = {self.nargs, **current}
TypeError: 'NoneType' object is not a mapping
Expected behavior
I'd like to ask you how to train in parallel with multi-GPU without errors
Environment info