mudler / LocalAI

:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed, P2P inference
https://localai.io
MIT License
26.61k stars 1.99k forks source link

Metal fails to initialize with: ggml_metal_init: error: Error Domain=NSCocoaErrorDomain Code=258 "The file name is invalid." #755

Open philippjbauer opened 1 year ago

philippjbauer commented 1 year ago

LocalAI version:

LocalAI version v1.20.1-57-gd0e67cc (d0e67cce7550389b657d37bc5956ce4a9e925321)

(compiled with Metal flag)

Environment, CPU architecture, OS, and Version:

Local, Apple M1 Pro, macOS Ventura 13.4.1

Describe the bug

When trying to load a model with the gpu_layers: 1 and f16: true parameters set in the model yaml file, Metal initialization appears to fail. The models run without these parameters set and create an answer.

To Reproduce

Build the project with Metal enabled. Add gpu_layers and f16 parameters to model yaml file according to comment in the documentation for how to integrate Metal support. Send question to LocalAI using the selected model.

Expected behavior

Metal initialization succeeds and LocalAI generates response with Metal acceleration.

Logs

The ggml_metal_init output at the end only appears when the aforementioned parameters are set in the model's yaml file. The ggml_metal_init output is not present when those parameters are not set.

DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:/PATH/TO/LocalAI/models/vicuna-7b-1.1.ggmlv3.q4_0.bin ContextSize:1024 Seed:0 NBatch:512 F16Memory:true MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:1 MainGPU: TensorSplit: Threads:4 LibrarySearchPath:}
DBG GRPC(llama-vicuna-7b-1.1.ggmlv3.q4_0.bin-localhost:63879): stderr llama.cpp: loading model from /PATH/TO/LocalAI/models/vicuna-7b-1.1.ggmlv3.q4_0.bin
DBG GRPC(llama-vicuna-7b-1.1.ggmlv3.q4_0.bin-localhost:63879): stderr llama_model_load_internal: format     = ggjt v3 (latest)
DBG GRPC(llama-vicuna-7b-1.1.ggmlv3.q4_0.bin-localhost:63879): stderr llama_model_load_internal: n_vocab    = 32000
DBG GRPC(llama-vicuna-7b-1.1.ggmlv3.q4_0.bin-localhost:63879): stderr llama_model_load_internal: n_ctx      = 1024
DBG GRPC(llama-vicuna-7b-1.1.ggmlv3.q4_0.bin-localhost:63879): stderr llama_model_load_internal: n_embd     = 4096
DBG GRPC(llama-vicuna-7b-1.1.ggmlv3.q4_0.bin-localhost:63879): stderr llama_model_load_internal: n_mult     = 256
DBG GRPC(llama-vicuna-7b-1.1.ggmlv3.q4_0.bin-localhost:63879): stderr llama_model_load_internal: n_head     = 32
DBG GRPC(llama-vicuna-7b-1.1.ggmlv3.q4_0.bin-localhost:63879): stderr llama_model_load_internal: n_layer    = 32
DBG GRPC(llama-vicuna-7b-1.1.ggmlv3.q4_0.bin-localhost:63879): stderr llama_model_load_internal: n_rot      = 128
DBG GRPC(llama-vicuna-7b-1.1.ggmlv3.q4_0.bin-localhost:63879): stderr llama_model_load_internal: ftype      = 2 (mostly Q4_0)
DBG GRPC(llama-vicuna-7b-1.1.ggmlv3.q4_0.bin-localhost:63879): stderr llama_model_load_internal: n_ff       = 11008
DBG GRPC(llama-vicuna-7b-1.1.ggmlv3.q4_0.bin-localhost:63879): stderr llama_model_load_internal: model size = 7B
DBG GRPC(llama-vicuna-7b-1.1.ggmlv3.q4_0.bin-localhost:63879): stderr llama_model_load_internal: ggml ctx size = 3615.72 MB
DBG GRPC(llama-vicuna-7b-1.1.ggmlv3.q4_0.bin-localhost:63879): stderr llama_model_load_internal: mem required  = 5407.72 MB (+ 1026.00 MB per state)
DBG GRPC(llama-vicuna-7b-1.1.ggmlv3.q4_0.bin-localhost:63879): stderr llama_new_context_with_model: kv self size  =  256.00 MB

>>>> RELEVANT PORTION <<<<<<
DBG GRPC(llama-vicuna-7b-1.1.ggmlv3.q4_0.bin-localhost:63879): stderr ggml_metal_init: allocating
DBG GRPC(llama-vicuna-7b-1.1.ggmlv3.q4_0.bin-localhost:63879): stderr ggml_metal_init: using MPS
DBG GRPC(llama-vicuna-7b-1.1.ggmlv3.q4_0.bin-localhost:63879): stderr ggml_metal_init: loading '(null)'
DBG GRPC(llama-vicuna-7b-1.1.ggmlv3.q4_0.bin-localhost:63879): stderr ggml_metal_init: error: Error Domain=NSCocoaErrorDomain Code=258 "The file name is invalid."
>>>> RELEVANT PORTION <<<<<<

Additional Information

Tried with the following models: vicuna-7b-1.1.ggmlv3.q4_0.bin nous-hermes-13b.ggmlv3.q4_0.bin Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_0.bin

philippjbauer commented 1 year ago

I can run go-llama / llama.cpp from within the go-llama folder with Metal acceleration. I've tried copying the ggml-metal.metal into different folders that would make sense to load it from but with no success.

I would expect it to be loaded from the CWD that I have LocalAI installed to as the Makefile indicates

localai-bot commented 1 year ago

:warning::warning::warning::warning::warning:

Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!

_but.... I can also be funny or helpful :smilecat: and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.

Don't engage in conversation with me, I don't support (yet) replying!

:warning::warning::warning::warning::warning:

It seems like the issue is related to the loading of the model file. The error message indicates that there might be a problem with the file name or path. Please double check the file name and path and ensure that it is correct. Additionally, please try restarting the application and running the code again to see if the issue persists. If the issue still occurs, please provide more information about the environment and setup used for debugging.

Sources:

mudler commented 1 year ago

hey @philippjbauer , I don't have an Apple HW, could you give a shot at #777?

Happyholic1203 commented 1 year ago

This worked for me: just copy ggml-metal.metal to wherever you copied the executable (or libllama.so, I guess) to.

For example, if you cp ./main /tmp/main, then you also have to cp ./ggml-metal.metal /tmp, otherwise /tmp/main won't run when MPS is enabled.

The problem is in ggml-meta.m, where path becomes null when ggml-meta.metal isn't in the same directory as the executable:

    // read the source from "ggml-metal.metal" into a string and use newLibraryWithSource
    {
        // ...
        NSBundle * bundle = [NSBundle bundleForClass:[GGMLMetalClass class]];
        NSString * path = [bundle pathForResource:@"ggml-metal" ofType:@"metal"];
        fprintf(stderr, "%s: loading '%s'\n", __func__, [path UTF8String]);

        NSString * src  = [NSString stringWithContentsOfFile:path encoding:NSUTF8StringEncoding error:&error];
        if (error) {
            // this error got triggered
            fprintf(stderr, "%s: error: %s\n", __func__, [[error description] UTF8String]);
            exit(1);
        }
mudler commented 1 year ago

after building the ggml-metal.m file should be already next to the binary - at least after #777 , and nothing should be actually be needed anymore - can you confirm ?

Happyholic1203 commented 1 year ago

Yes, you are right, mudler!

By default, ggml-metal.metal (not ggml-metal.m, which gets compiled and linked into the executables) is already in the same directory as the executables, and nothing is required to run the models using MPS.

I posted that comment because I happened to move ./main around and the exact same error came up, and I was able to resolve it by moving ggml-metal.metal to the same directory (after the move).

Maybe philippjbauer ran into a different problem than mine, but the error message is the same as mine, and I searched through the entire code base, there's only one place for that message to happen.