Open lvbu12 opened 1 year ago
Hi, currently the lazy initialization feature doesn't support torch 2.0 well, so would you please downgrade torch to 1.13/1.12 and try again ? If you want to avoid using lazy initialization, please modify line 197-198 of pretrain.py
to init_ctx = nullcontext()
, but this might cause OOM for Llama-70B
🐛 Describe the bug
File "/usr/local/lib/python3.10/dist-packages/torch/decomp/decompositions.py", line 1958, in uniform return self.copy_((high - low) torch.rand_like(self) + low) File "/usr/local/lib/python3.10/dist-packages/colossalai/lazy/lazyinit.py", line 476, in wrapper return self.copy((high - low) torch.rand_like(self) + low) File "/usr/local/lib/python3.10/dist-packages/colossalai/lazy/lazy_init.py", line 476, in wrapper return self.tensor_cls(orig_target, *args[1:], device=orig_t.device, dtype=orig_t.dtype, kwargs) File "/usr/local/lib/python3.10/dist-packages/colossalai/lazy/lazy_init.py", line 163, in new elem = func(args, {kwargs, 'device': 'meta'}) return self.tensor_cls(orig_target, args[1:], device=orig_t.device, dtype=orig_t.dtype, kwargs) TypeError File "/usr/local/lib/python3.10/dist-packages/colossalai/lazy/lazy_init.py", line 163, in new : rand() received an invalid combination of arguments - got (dtype=torch.dtype, device=str, ), but expected one of:
(tuple of ints size, *, tuple of names names, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
elem = func(*args, {kwargs, 'device': 'meta'}) TypeError: rand() received an invalid combination of arguments - got (dtype=torch.dtype, device=str, ), but expected one of:
Environment
pytorch: 2.0.1+cu117