Miss Requirement tensorboardX

bash99 commented 1 year ago

I've use last code,

from textgen import ChatGlmModel

return a error "No module named 'tensorboardX'"

after install

pip3 install tensorboardX crc32c soundfile

it works without error

shibing624 commented 1 year ago

谢谢指出，我加下。

bash99 commented 1 year ago

谢谢指出，我加下。

但是测试chatglm lora时仍然报错，python版本3.10 pytorch 2.0 cuda 11.7；错误如下

RuntimeError: Expected 4-dimensional input for 4-dimensional weight [8192, 8, 1, 1], but got 3-dimensional input of size [1, 16, 4096] instead

完整log如下，可以看到是从抱抱脸直接新加载的原始chatglm模型和lora ··· ~/dl/textgen/examples/chatglm$ python3 predict_demo.py 2023-04-13 17:05:12.591 | DEBUG | textgen.chatglm.chatglm_model:init:98 - Device: cuda Downloading (…)lve/main/config.json: 100%|██████████████████████████████████████████████████████████████████████| 773/773 [00:00<00:00, 489kB/s] Explicitly passing a revision is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision. Downloading (…)iguration_chatglm.py: 100%|█████████████████████████████████████████████████████████████████| 4.28k/4.28k [00:00<00:00, 2.66MB/s] Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision. Downloading (…)/modeling_chatglm.py: 100%|██████████████████████████████████████████████████████████████████| 56.5k/56.5k [00:00<00:00, 243kB/s] Downloading (…)main/quantization.py: 100%|█████████████████████████████████████████████████████████████████| 15.1k/15.1k [00:00<00:00, 1.13MB/s] Downloading (…)model.bin.index.json: 100%|█████████████████████████████████████████████████████████████████| 33.4k/33.4k [00:00<00:00, 1.68MB/s] Downloading (…)l-00001-of-00008.bin: 100%|█████████████████████████████████████████████████████████████████| 1.74G/1.74G [03:47<00:00, 7.66MB/s] Downloading (…)l-00002-of-00008.bin: 100%|█████████████████████████████████████████████████████████████████| 1.88G/1.88G [03:52<00:00, 8.08MB/s] Downloading (…)l-00003-of-00008.bin: 100%|█████████████████████████████████████████████████████████████████| 1.98G/1.98G [04:02<00:00, 8.18MB/s] Downloading (…)l-00004-of-00008.bin: 100%|█████████████████████████████████████████████████████████████████| 1.91G/1.91G [03:59<00:00, 8.00MB/s] Downloading (…)l-00005-of-00008.bin: 100%|█████████████████████████████████████████████████████████████████| 1.88G/1.88G [03:59<00:00, 7.83MB/s] Downloading (…)l-00006-of-00008.bin: 100%|█████████████████████████████████████████████████████████████████| 1.88G/1.88G [03:56<00:00, 7.94MB/s] Downloading (…)l-00007-of-00008.bin: 100%|█████████████████████████████████████████████████████████████████| 1.07G/1.07G [02:01<00:00, 8.87MB/s] Downloading (…)l-00008-of-00008.bin: 100%|█████████████████████████████████████████████████████████████████| 1.07G/1.07G [01:52<00:00, 9.54MB/s] Downloading shards: 100%|████████████████████████████████████████████████████████████████████████████████████████| 8/8 [27:40<00:00, 207.57s/it] Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████| 8/8 [00:09<00:00, 1.14s/it] Downloading (…)okenizer_config.json: 100%|██████████████████████████████████████████████████████████████████████| 441/441 [00:00<00:00, 211kB/s] Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision. Downloading (…)enization_chatglm.py: 100%|█████████████████████████████████████████████████████████████████| 16.7k/16.7k [00:00<00:00, 4.17MB/s] Downloading ice_text.model: 100%|██████████████████████████████████████████████████████████████████████████| 2.71M/2.71M [00:00<00:00, 6.89MB/s] Downloading (…)/adapter_config.json: 100%|██████████████████████████████████████████████████████████████████████| 383/383 [00:00<00:00, 174kB/s] Downloading adapter_model.bin: 100%|███████████████████████████████████████████████████████████████████████| 14.7M/14.7M [00:03<00:00, 4.42MB/s] 2023-04-13 17:33:50.997 | INFO | textgen.chatglm.chatglm_model:load_lora:342 - Loaded lora model from shibing624/chatglm-6b-csc-zh-lora

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /home/test/dl/textgen/examples/chatglm/predict_demo.py:12 in │ │ │ │ 9 from textgen import ChatGlmModel │ │ 10 │ │ 11 model = ChatGlmModel("chatglm", "THUDM/chatglm-6b", lora_name="shibing624/chatglm-6b-csc │ │ ❱ 12 r = model.predict(["对下面中文拼写纠错：\n少先队员因该为老人让坐。\n答："]) │ │ 13 print(r) # ['少先队员应该为老人让座。\n错误字：因，坐'] │ │ 14 │ │ 15 r = model.predict(["失眠怎么办？"]) │ │ │ │ /home/test/mambaforge/lib/python3.10/site-packages/torch/utils/_contextlib.py:115 │ │ in decorate_context │ │ │ │ 112 │ @functools.wraps(func) │ │ 113 │ def decorate_context(*args, *kwargs): │ │ 114 │ │ with ctx_factory(): │ │ ❱ 115 │ │ │ return func(args, **kwargs) │ │ 116 │ │ │ 117 │ return decorate_context │ │ 118 │ │ │ │ /home/test/dl/textgen/textgen/chatglm/chatglm_model.py:384 in predict │ │ │ │ 381 │ │ if not self.lora_loaded: │ │ 382 │ │ │ self.load_lora() │ │ 383 │ │ self._move_model_to_device() │ │ ❱ 384 │ │ self.model.eval() │ │ 385 │ │ │ │ 386 │ │ all_outputs = [] │ │ 387 │ │ # Batching │ │ │ │ /home/test/mambaforge/lib/python3.10/site-packages/torch/nn/modules/module.py:2307 │ │ in eval │ │ │ │ 2304 │ │ Returns: │ │ 2305 │ │ │ Module: self │ │ 2306 │ │ """ │ │ ❱ 2307 │ │ return self.train(False) │ │ 2308 │ │ │ 2309 │ def requiresgrad(self: T, requires_grad: bool = True) -> T: │ │ 2310 │ │ r"""Change if autograd should record operations on parameters in this │ │ │ │ /home/test/mambaforge/lib/python3.10/site-packages/torch/nn/modules/module.py:2288 │ │ in train │ │ │ │ 2285 │ │ │ raise ValueError("training mode is expected to be boolean") │ │ 2286 │ │ self.training = mode │ │ 2287 │ │ for module in self.children(): │ │ ❱ 2288 │ │ │ module.train(mode) │ │ 2289 │ │ return self │ │ 2290 │ │ │ 2291 │ def eval(self: T) -> T: │ │ │ │ /home/test/mambaforge/lib/python3.10/site-packages/torch/nn/modules/module.py:2288 │ │ in train │ │ │ │ 2285 │ │ │ raise ValueError("training mode is expected to be boolean") │ │ 2286 │ │ self.training = mode │ │ 2287 │ │ for module in self.children(): │ │ ❱ 2288 │ │ │ module.train(mode) │ │ 2289 │ │ return self │ │ 2290 │ │ │ 2291 │ def eval(self: T) -> T: │ │ │ │ /home/test/mambaforge/lib/python3.10/site-packages/torch/nn/modules/module.py:2288 │ │ in train │ │ │ │ 2285 │ │ │ raise ValueError("training mode is expected to be boolean") │ │ 2286 │ │ self.training = mode │ │ 2287 │ │ for module in self.children(): │ │ ❱ 2288 │ │ │ module.train(mode) │ │ 2289 │ │ return self │ │ 2290 │ │ │ 2291 │ def eval(self: T) -> T: │ │ │ │ /home/test/mambaforge/lib/python3.10/site-packages/torch/nn/modules/module.py:2288 │ │ in train │ │ │ │ 2285 │ │ │ raise ValueError("training mode is expected to be boolean") │ │ 2286 │ │ self.training = mode │ │ 2287 │ │ for module in self.children(): │ │ ❱ 2288 │ │ │ module.train(mode) │ │ 2289 │ │ return self │ │ 2290 │ │ │ 2291 │ def eval(self: T) -> T: │ │ │ │ /home/test/mambaforge/lib/python3.10/site-packages/torch/nn/modules/module.py:2288 │ │ in train │ │ │ │ 2285 │ │ │ raise ValueError("training mode is expected to be boolean") │ │ 2286 │ │ self.training = mode │ │ 2287 │ │ for module in self.children(): │ │ ❱ 2288 │ │ │ module.train(mode) │ │ 2289 │ │ return self │ │ 2290 │ │ │ 2291 │ def eval(self: T) -> T: │ │ │ │ /home/test/mambaforge/lib/python3.10/site-packages/torch/nn/modules/module.py:2288 │ │ in train │ │ │ │ 2285 │ │ │ raise ValueError("training mode is expected to be boolean") │ │ 2286 │ │ self.training = mode │ │ 2287 │ │ for module in self.children(): │ │ ❱ 2288 │ │ │ module.train(mode) │ │ 2289 │ │ return self │ │ 2290 │ │ │ 2291 │ def eval(self: T) -> T: │ │ │ │ /home/test/mambaforge/lib/python3.10/site-packages/torch/nn/modules/module.py:2288 │ │ in train │ │ │ │ 2285 │ │ │ raise ValueError("training mode is expected to be boolean") │ │ 2286 │ │ self.training = mode │ │ 2287 │ │ for module in self.children(): │ │ ❱ 2288 │ │ │ module.train(mode) │ │ 2289 │ │ return self │ │ 2290 │ │ │ 2291 │ def eval(self: T) -> T: │ │ │ │ /home/test/mambaforge/lib/python3.10/site-packages/torch/nn/modules/module.py:2288 │ │ in train │ │ │ │ 2285 │ │ │ raise ValueError("training mode is expected to be boolean") │ │ 2286 │ │ self.training = mode │ │ 2287 │ │ for module in self.children(): │ │ ❱ 2288 │ │ │ module.train(mode) │ │ 2289 │ │ return self │ │ 2290 │ │ │ 2291 │ def eval(self: T) -> T: │ │ │ │ /home/test/mambaforge/lib/python3.10/site-packages/peft/tuners/lora.py:417 in │ │ train │ │ │ │ 414 │ │ if not mode and self.merge_weights and not self.merged: │ │ 415 │ │ │ # Merge the weights and mark it │ │ 416 │ │ │ if self.r > 0 and any(self.enable_lora): │ │ ❱ 417 │ │ │ │ delta_w = F.conv1d( │ │ 418 │ │ │ │ │ self.lora_A.weight.data.unsqueeze(0), │ │ 419 │ │ │ │ │ self.lora_B.weight.data.unsqueeze(-1), │ │ 420 │ │ │ │ │ groups=sum(self.enable_lora), │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯

···

shibing624 commented 1 year ago

pip install git+https://github.com/huggingface/transformers pip install git+https://github.com/huggingface/peft 使用的最新的代码吗？我本地刚测试是ok的，如果还有问题告诉我。

bash99 commented 1 year ago

python3 predict_demo.py 还是有错，要不你给我一下对应的python pytorch等版本？我用conda 新建一个环境干净的测试一下？
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
size mismatch for base_model.model.transformer.layers.0.attention.query_key_value.lora_A.default.weight: copying a param with shape torch.Size([16,
4096]) from checkpoint, the shape in current model is torch.Size([8, 4096]).
...

shibing624 commented 1 year ago

是peft的版本问题，自己train就可以了。

shibing624 commented 1 year ago

我刚更了一版csc的lora模型： https://huggingface.co/shibing624/chatglm-6b-csc-zh-lora/tree/main
安装最新版peft：pip install git+https://github.com/huggingface/peft

bash99 commented 1 year ago

我刚更了一版csc的lora模型： https://huggingface.co/shibing624/chatglm-6b-csc-zh-lora/tree/main

安装最新版peft：pip install git+https://github.com/huggingface/peft

新版的lora加载和推理正常了，不过demo中改错只找到一个错字。

Generating outputs: 100%| ... | 1/1 [00:05<00:00,  5.84s/it]
['少先队员应该为老人让坐。\n错误字：因']

但是继续输入就找到下一个错误

r = model.predict(["对下面中文拼写纠错：\n少先队员应该为老人让坐。\n答："])
print(r)

['少先队员应该为老人让座。\n错误字：坐']

shibing624 commented 1 year ago

建议可以自行训练纠错数据集，我没GPU资源训练了，大概train了0.2个epoch就release模型了，后面有时间和资源再训练吧，还得等peft版本稳定才好放出模型，不然peft版本更新可能不兼容旧的lora。。。

shibing624 / textgen

Miss Requirement tensorboardX #16