Open zhangyuanscall opened 1 year ago
Bot detected the issue body's language is not English, translate it automatically. ๐ฏ๐ญ๐ป๐งโ๐คโ๐ง๐ซ๐ง๐ฟโ๐คโ๐ง๐ป๐ฉ๐พโ๐คโ๐จ๐ฟ๐ฌ๐ฟ
The same question
Colossal replaces the init function according to the torch.nn.modules.module.Module which could be seen as a registering pool. So if you use a custom nn.Module which has not been imported before entering the ColoInitContext, then there will be some unseen nn.Modules after initializing your model (if some custom components are used by your model) and raise an error when exiting the ColoInitContext during checking the _old_init function. Therefore, maybe you should import some custom components at the start of your training script (before entering ColoInitContext).
There is the chaging of the number of imported nn.Modules:
>>> import torch
>>> len(torch.nn.modules.module.Module.__subclasses__())
110
>>> import transformers.activations # some custom activations
>>> len(torch.nn.modules.module.Module.__subclasses__())
122
>>> import transformers.pytorch_utils # Conv1d
>>> len(torch.nn.modules.module.Module.__subclasses__())
123
>>>
If you get a component error and do not know which package it comes from, then you could print the torch.nn.modules.module.Module.subclasses() to find it.
The ord_init error usually happens with dynamically imported models, especially models that import huggingface with remote codes. So the best solution is to import model.py in advance. if you have a path.model.py, import this in advance and you can see torch.nn.modules.module.Module.subclasses() is correct now.
>>> import importlib
>>> x = importlib.import_module('path.model')
๐ Describe the bug
use the example to fintune chatglm like https://github.com/hpcaitech/ColossalAI/blob/main/examples/language/gpt/gemini/train_gpt_demo.py, but report error
code
error
my full finetune train python file:
Environment
Package Version
accelerate 0.17.1 bitsandbytes 0.37.1 datasets 2.12.0 decorator 4.4.2 deepspeed 0.9.1 huggingface-hub 0.13.3 numpy 1.23.0 torch 1.11.0+cu113 torchaudio 0.11.0+rocm4.5.2 torchvision 0.12.0+cu113 colossalai 0.2.8