[ERROR] PRE_ACT(54175,fffdbcff91e0,python):2024-10-22-11:24:55.301.443 [mindspore/ccsrc/backend/common/mem_reuse/mem_dynamic_allocator.cc:392] AddMemBlockAndMemBufByEagerFree] TotalUsedMemStatistics : 29356337664 plus TotalUsedByEventMemStatistics : 0 and plus alloc size : 301990400 is more than total mem size : 29464985600.
Traceback (most recent call last):
File "/data/applications/workspace/mindnlp/mindnlp1/mindnlp/llm/inference/gemma/run_gemma.py", line 17, in
outputs = model.generate(input_ids)
File "/data/applications/workspace/miniconda3/envs/mindnlp/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, *kwds)
File "/data/applications/workspace/miniconda3/envs/mindnlp/lib/python3.10/site-packages/mindnlp/transformers/generation/utils.py", line 2025, in generate
result = self._sample(
File "/data/applications/workspace/miniconda3/envs/mindnlp/lib/python3.10/site-packages/mindnlp/transformers/generation/utils.py", line 3024, in _sample
while self._has_unfinished_sequences(
File "/data/applications/workspace/miniconda3/envs/mindnlp/lib/python3.10/site-packages/mindnlp/transformers/generation/utils.py", line 2231, in _has_unfinished_sequences
if this_peer_finished:
File "/data/applications/workspace/miniconda3/envs/mindnlp/lib/python3.10/site-packages/mindspore/common/tensor.py", line 347, in bool
data = self.asnumpy()
File "/data/applications/workspace/miniconda3/envs/mindnlp/lib/python3.10/site-packages/mindspore/common/_stub_tensor.py", line 49, in fun
return method(arg, kwargs)
File "/data/applications/workspace/miniconda3/envs/mindnlp/lib/python3.10/site-packages/mindspore/common/tensor.py", line 1055, in asnumpy
return Tensor_.asnumpy(self)
RuntimeError: Allocate memory failed
E40021: Failed to compile Op [BitwiseOr3]. (oppath: [Pre-compile /usr/local/Ascend/ascend-toolkit/8.0.RC1.alpha003/opp/built-in/op_impl/ai_core/tbe/impl/dynamic/bitwise_or.py failed with errormsg/stack: ], optype: [BitwiseOr])[THREAD:63823]
Solution: See the host log for details, and then check the Python stack where the error log is reported.
TraceBack (most recent call last):
Pre-compile op[BitwiseOr3] failed, oppath[/usr/local/Ascend/ascend-toolkit/8.0.RC1.alpha003/opp/built-in/op_impl/ai_core/tbe/impl/dynamic/bitwise_or.py], optype[BitwiseOr], taskID[7]. Please check op's compilation error message.[FUNC:ReportBuildErrMessage][FILE:fusion_manager.cc][LINE:753][THREAD:63823]
[SubGraphOpt][Compile][ProcFailedCompTask] Thread[281432957424096] recompile single op[BitwiseOr3] failed[FUNC:ProcessAllFailedCompileTasks][FILE:tbe_op_store_adapter.cc][LINE:956][THREAD:63823]
[SubGraphOpt][Compile][ParalCompOp] Thread[281432957424096] process fail task failed[FUNC:ParallelCompileOp][FILE:tbe_op_store_adapter.cc][LINE:1004][THREAD:63823]
[SubGraphOpt][Compile][CompOpOnly] CompileOp failed.[FUNC:CompileOpOnly][FILE:op_compiler.cc][LINE:1119][THREAD:63823]
[GraphOpt][FusedGraph][RunCompile] Failed to compile graph with compiler Normal mode Op Compiler[FUNC:SubGraphCompile][FILE:fe_graph_optimizer.cc][LINE:1385][THREAD:63823]
Call OptimizeFusedGraph failed, ret:-1, engine_name:AIcoreEngine, graph_name:partition0_rank1_new_sub_graph2[FUNC:OptimizeSubGraph][FILE:graph_optimize.cc][LINE:126][THREAD:63823]
subgraph 0 optimize failed[FUNC:OptimizeSubGraphWithMultiThreads][FILE:graph_manager.cc][LINE:1021][THREAD:55783]
build graph failed, graph id:2, ret:-1[FUNC:BuildModelWithGraphId][FILE:ge_generator.cc][LINE:1615][THREAD:55783]
[Build][SingleOpModel]call ge interface generator.BuildSingleOpModel failed. ge result = 4294967295[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161][THREAD:55783]
[Build][Op]Fail to build op model[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145][THREAD:55783]
build op model failed, result = 500002[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145][THREAD:55783]
(Please search "CANN Common Error Analysis" at https://www.mindspore.cn for error code description)
C++ Call Stack: (For framework developers)
mindspore/ccsrc/transform/acl_ir/acl_utils.cc:379 Run
from mindnlp.transformers import AutoTokenizer, AutoModelForCausalLM
hf_token = 'your_huggingface_access_token'
tokenizer = AutoTokenizer.from_pretrained("/data/applications/lmd-formal/backend/BaseModels/gemma-7b", token=hf_token) model = AutoModelForCausalLM.from_pretrained("/data/applications/lmd-formal/backend/BaseModels/gemma-7b", token=hf_token)
input_text = "Write me a poem about Machine Learning." input_ids = tokenizer(input_text, return_tensors="ms")
减少输入数据的大小
input_ids = { 'input_ids': input_ids['input_ids'][:50], # 减少输入数据的大小 'attention_mask': input_ids['attention_mask'][:50] }
outputs = model.generate(**input_ids) print(tokenizer.decode(outputs[0]))
上述是mindnlp中的.py文件路径;其中路径如下“mindnlp/llm/inference/gemma/run_gemma.py”;我在npu上运行之后返回MindSpore框架中发生了内存分配失败和段错误(Segmentation fault);具体报错如下,到底该怎么解决问题啊。
[ERROR] PRE_ACT(54175,fffdbcff91e0,python):2024-10-22-11:24:55.301.443 [mindspore/ccsrc/backend/common/mem_reuse/mem_dynamic_allocator.cc:392] AddMemBlockAndMemBufByEagerFree] TotalUsedMemStatistics : 29356337664 plus TotalUsedByEventMemStatistics : 0 and plus alloc size : 301990400 is more than total mem size : 29464985600. Traceback (most recent call last): File "/data/applications/workspace/mindnlp/mindnlp1/mindnlp/llm/inference/gemma/run_gemma.py", line 17, in
outputs = model.generate(input_ids)
File "/data/applications/workspace/miniconda3/envs/mindnlp/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, *kwds)
File "/data/applications/workspace/miniconda3/envs/mindnlp/lib/python3.10/site-packages/mindnlp/transformers/generation/utils.py", line 2025, in generate
result = self._sample(
File "/data/applications/workspace/miniconda3/envs/mindnlp/lib/python3.10/site-packages/mindnlp/transformers/generation/utils.py", line 3024, in _sample
while self._has_unfinished_sequences(
File "/data/applications/workspace/miniconda3/envs/mindnlp/lib/python3.10/site-packages/mindnlp/transformers/generation/utils.py", line 2231, in _has_unfinished_sequences
if this_peer_finished:
File "/data/applications/workspace/miniconda3/envs/mindnlp/lib/python3.10/site-packages/mindspore/common/tensor.py", line 347, in bool
data = self.asnumpy()
File "/data/applications/workspace/miniconda3/envs/mindnlp/lib/python3.10/site-packages/mindspore/common/_stub_tensor.py", line 49, in fun
return method(arg, kwargs)
File "/data/applications/workspace/miniconda3/envs/mindnlp/lib/python3.10/site-packages/mindspore/common/tensor.py", line 1055, in asnumpy
return Tensor_.asnumpy(self)
RuntimeError: Allocate memory failed
C++ Call Stack: (For framework developers)
mindspore/ccsrc/runtime/device/device_address_utils.cc:921 MallocForInput
[ERROR] KERNEL(54175,fffdbcff91e0,python):2024-10-22-11:26:24.972.704 [mindspore/ccsrc/plugin/device/ascend/kernel/acl/acl_kernel_mod.cc:260] Launch] Kernel launch failed, msg: Acl compile and execute failed, optype:BitwiseOr
Ascend Error Message:
E40021: Failed to compile Op [BitwiseOr3]. (oppath: [Pre-compile /usr/local/Ascend/ascend-toolkit/8.0.RC1.alpha003/opp/built-in/op_impl/ai_core/tbe/impl/dynamic/bitwise_or.py failed with errormsg/stack: ], optype: [BitwiseOr])[THREAD:63823] Solution: See the host log for details, and then check the Python stack where the error log is reported. TraceBack (most recent call last): Pre-compile op[BitwiseOr3] failed, oppath[/usr/local/Ascend/ascend-toolkit/8.0.RC1.alpha003/opp/built-in/op_impl/ai_core/tbe/impl/dynamic/bitwise_or.py], optype[BitwiseOr], taskID[7]. Please check op's compilation error message.[FUNC:ReportBuildErrMessage][FILE:fusion_manager.cc][LINE:753][THREAD:63823] [SubGraphOpt][Compile][ProcFailedCompTask] Thread[281432957424096] recompile single op[BitwiseOr3] failed[FUNC:ProcessAllFailedCompileTasks][FILE:tbe_op_store_adapter.cc][LINE:956][THREAD:63823] [SubGraphOpt][Compile][ParalCompOp] Thread[281432957424096] process fail task failed[FUNC:ParallelCompileOp][FILE:tbe_op_store_adapter.cc][LINE:1004][THREAD:63823] [SubGraphOpt][Compile][CompOpOnly] CompileOp failed.[FUNC:CompileOpOnly][FILE:op_compiler.cc][LINE:1119][THREAD:63823] [GraphOpt][FusedGraph][RunCompile] Failed to compile graph with compiler Normal mode Op Compiler[FUNC:SubGraphCompile][FILE:fe_graph_optimizer.cc][LINE:1385][THREAD:63823] Call OptimizeFusedGraph failed, ret:-1, engine_name:AIcoreEngine, graph_name:partition0_rank1_new_sub_graph2[FUNC:OptimizeSubGraph][FILE:graph_optimize.cc][LINE:126][THREAD:63823] subgraph 0 optimize failed[FUNC:OptimizeSubGraphWithMultiThreads][FILE:graph_manager.cc][LINE:1021][THREAD:55783] build graph failed, graph id:2, ret:-1[FUNC:BuildModelWithGraphId][FILE:ge_generator.cc][LINE:1615][THREAD:55783] [Build][SingleOpModel]call ge interface generator.BuildSingleOpModel failed. ge result = 4294967295[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161][THREAD:55783] [Build][Op]Fail to build op model[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145][THREAD:55783] build op model failed, result = 500002[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145][THREAD:55783]
(Please search "CANN Common Error Analysis" at https://www.mindspore.cn for error code description)
C++ Call Stack: (For framework developers)
mindspore/ccsrc/transform/acl_ir/acl_utils.cc:379 Run
[ERROR] DEVICE(54175,fffdbcff91e0,python):2024-10-22-11:26:24.972.826 [mindspore/ccsrc/plugin/device/ascend/hal/hardware/ge_kernel_executor.cc:1156] LaunchKernel] Launch kernel failed, kernel full name: Default/BitwiseOr-op0 [ERROR] RUNTIME_FRAMEWORK(54175,ffff93ba20e0,python):2024-10-22-11:26:25.892.561 [mindspore/ccsrc/runtime/pipeline/async_rqueue.cc:198] WorkerJoin] WorkerJoin failed: Launch kernel failed, name:Default/BitwiseOr-op0
C++ Call Stack: (For framework developers)
mindspore/ccsrc/runtime/pynative/op_runner.cc:624 LaunchKernels