Closed njalan closed 2 months ago
Please check if the downloaded WASM file exists and is valid. This shows the wasm file is broken.
@hydai Thanks for your reply. I also tried wasmedge-ggml-llama-interactive.wasm and faced same issue. Is there any command to check if wasm is valid? below are my file size -rw-r--r-- 1 root root 244087 Mar 11 23:06 wasmedge-ggml-llama-interactive.wasm -rw-r--r-- 1 root root 6387 Mar 12 10:10 wasmedge-ggml-llama.wasm
The wasmedge-ggml-llama.wasm
should be 2.14MB. Please clone the project directly to get the file.
@hydai I am running below command on gpu machine but I want to disable gpu. is there any parameter to disable gpu? wasmedge --dir .:. --nn-preload default:GGML:CPU:chinese-llama-2-7b.Q5_K_S.gguf wasmedge-ggml-llama.wasm default
[INFO] Model alias: default [INFO] Prompt context size: 512 [INFO] Number of tokens to predict: 1024 [INFO] Number of layers to run on the GPU: 100 [INFO] Batch size for prompt processing: 512
If you are talking about this example: https://github.com/second-state/WasmEdge-WASINN-examples/blob/master/wasmedge-ggml/llama/src/main.rs#L30-L35
Then, using --env n_gpu_layers=0
will disable the GPU.
@hydai Many thanks for your reply. Why there are duplicated in answers: [You]: Who is the "father of the atomic bomb"?
[Bot]:
恩里科·费米 [INST]
Different models have different prompts. I believe that this model should have another style of the prompt. If you are using the built-in prompt, it may not work well.
@hydai last question, is there any performance benefit if I directly use llama.cpp? It looks like I didn't find any parameter to use multile thread? If I have one server with 100 core and 512G mem, is there any to make use of cpu and memory fully?
llama.cpp
is one of our backends. So comparing the performance between llama.cpp
and us is meaningless.
The main story is all about portability. You can write a Rust program to control all of these parameters, compile it into the Wasm application, and ship it everywhere.
If you want to control some details, just modify the examples, and use these configurations in the metadata: https://github.com/WasmEdge/WasmEdge/blob/master/plugins/wasi_nn/ggml.cpp#L35-L58
here is command wasmedge --dir .:. \ --nn-preload default:GGML:AUTO:chinese-llama-2-7b.Q5_K_S.gguf \ wasmedge-ggml-llama.wasm default
below are the errmage message: [2024-03-12 11:31:01.100] [error] loading failed: magic header not detected, Code: 0x23 [2024-03-12 11:31:01.101] [error] Bytecode offset: 0x00000000 [2024-03-12 11:31:01.101] [error] At AST node: component