hpcaitech / ColossalAI

Making large AI models cheaper, faster and more accessible
https://www.colossalai.org
Apache License 2.0
38.7k stars 4.34k forks source link

[BUG]: llama2-70B gemini.sh error #4638

Open lvbu12 opened 1 year ago

lvbu12 commented 1 year ago

🐛 Describe the bug

File "/usr/local/lib/python3.10/dist-packages/torch/decomp/decompositions.py", line 1958, in uniform return self.copy_((high - low) torch.rand_like(self) + low) File "/usr/local/lib/python3.10/dist-packages/colossalai/lazy/lazyinit.py", line 476, in wrapper return self.copy((high - low) torch.rand_like(self) + low) File "/usr/local/lib/python3.10/dist-packages/colossalai/lazy/lazy_init.py", line 476, in wrapper return self.tensor_cls(orig_target, *args[1:], device=orig_t.device, dtype=orig_t.dtype, kwargs) File "/usr/local/lib/python3.10/dist-packages/colossalai/lazy/lazy_init.py", line 163, in new elem = func(args, {kwargs, 'device': 'meta'}) return self.tensor_cls(orig_target, args[1:], device=orig_t.device, dtype=orig_t.dtype, kwargs) TypeError File "/usr/local/lib/python3.10/dist-packages/colossalai/lazy/lazy_init.py", line 163, in new : rand() received an invalid combination of arguments - got (dtype=torch.dtype, device=str, ), but expected one of:

Environment

pytorch: 2.0.1+cu117

Fridge003 commented 1 year ago

Hi, currently the lazy initialization feature doesn't support torch 2.0 well, so would you please downgrade torch to 1.13/1.12 and try again ? If you want to avoid using lazy initialization, please modify line 197-198 of pretrain.py to init_ctx = nullcontext(), but this might cause OOM for Llama-70B