mindspore-lab / mindnlp

Easy-to-use and high-performance NLP and LLM framework based on MindSpore, compatible with models and datasets of 🤗Huggingface.
https://mindnlp.cqu.ai/
Apache License 2.0
702 stars 197 forks source link

mindnlp微调 #1761

Closed lwc312123 closed 3 weeks ago

lwc312123 commented 3 weeks ago

from mindnlp.transformers import AutoTokenizer, AutoModelForCausalLM

hf_token = 'your_huggingface_access_token'

tokenizer = AutoTokenizer.from_pretrained("/data/applications/lmd-formal/backend/BaseModels/gemma-7b", token=hf_token) model = AutoModelForCausalLM.from_pretrained("/data/applications/lmd-formal/backend/BaseModels/gemma-7b", token=hf_token)

input_text = "Write me a poem about Machine Learning." input_ids = tokenizer(input_text, return_tensors="ms")

减少输入数据的大小

input_ids = { 'input_ids': input_ids['input_ids'][:50], # 减少输入数据的大小 'attention_mask': input_ids['attention_mask'][:50] }

outputs = model.generate(**input_ids) print(tokenizer.decode(outputs[0]))

上述是mindnlp中的.py文件路径;其中路径如下“mindnlp/llm/inference/gemma/run_gemma.py”;我在npu上运行之后返回MindSpore框架中发生了内存分配失败和段错误(Segmentation fault);具体报错如下,到底该怎么解决问题啊。

[ERROR] PRE_ACT(54175,fffdbcff91e0,python):2024-10-22-11:24:55.301.443 [mindspore/ccsrc/backend/common/mem_reuse/mem_dynamic_allocator.cc:392] AddMemBlockAndMemBufByEagerFree] TotalUsedMemStatistics : 29356337664 plus TotalUsedByEventMemStatistics : 0 and plus alloc size : 301990400 is more than total mem size : 29464985600. Traceback (most recent call last): File "/data/applications/workspace/mindnlp/mindnlp1/mindnlp/llm/inference/gemma/run_gemma.py", line 17, in outputs = model.generate(input_ids) File "/data/applications/workspace/miniconda3/envs/mindnlp/lib/python3.10/contextlib.py", line 79, in inner return func(*args, *kwds) File "/data/applications/workspace/miniconda3/envs/mindnlp/lib/python3.10/site-packages/mindnlp/transformers/generation/utils.py", line 2025, in generate result = self._sample( File "/data/applications/workspace/miniconda3/envs/mindnlp/lib/python3.10/site-packages/mindnlp/transformers/generation/utils.py", line 3024, in _sample while self._has_unfinished_sequences( File "/data/applications/workspace/miniconda3/envs/mindnlp/lib/python3.10/site-packages/mindnlp/transformers/generation/utils.py", line 2231, in _has_unfinished_sequences if this_peer_finished: File "/data/applications/workspace/miniconda3/envs/mindnlp/lib/python3.10/site-packages/mindspore/common/tensor.py", line 347, in bool data = self.asnumpy() File "/data/applications/workspace/miniconda3/envs/mindnlp/lib/python3.10/site-packages/mindspore/common/_stub_tensor.py", line 49, in fun return method(arg, kwargs) File "/data/applications/workspace/miniconda3/envs/mindnlp/lib/python3.10/site-packages/mindspore/common/tensor.py", line 1055, in asnumpy return Tensor_.asnumpy(self) RuntimeError: Allocate memory failed


[ERROR] KERNEL(54175,fffdbcff91e0,python):2024-10-22-11:26:24.972.704 [mindspore/ccsrc/plugin/device/ascend/kernel/acl/acl_kernel_mod.cc:260] Launch] Kernel launch failed, msg: Acl compile and execute failed, optype:BitwiseOr


(Please search "CANN Common Error Analysis" at https://www.mindspore.cn for error code description)


[ERROR] DEVICE(54175,fffdbcff91e0,python):2024-10-22-11:26:24.972.826 [mindspore/ccsrc/plugin/device/ascend/hal/hardware/ge_kernel_executor.cc:1156] LaunchKernel] Launch kernel failed, kernel full name: Default/BitwiseOr-op0 [ERROR] RUNTIME_FRAMEWORK(54175,ffff93ba20e0,python):2024-10-22-11:26:25.892.561 [mindspore/ccsrc/runtime/pipeline/async_rqueue.cc:198] WorkerJoin] WorkerJoin failed: Launch kernel failed, name:Default/BitwiseOr-op0


lvyufeng commented 3 weeks ago

cann版本: mindspore版本: mindnlp版本:

lvyufeng commented 3 weeks ago

请给出具体版本,或者尝试最新版mindnlp+mindspore2.3.1