Open zcourts opened 1 year ago
Already on it, got it converted and quantized but it produced gibberish. Im waiting on https://github.com/ggerganov/llama.cpp/issues/1602 to see how they will handle the Q, K, V weights. I dont want to create two seperate falcon-ggml ecosystems, so im waiting for the upstream ggml implementation.
Ongoing discussion worth tracking here to get GG conversion https://github.com/ggerganov/llama.cpp/issues/1602
Found after posting this here. An attempt to convert has been made https://github.com/ggerganov/llama.cpp/issues/1602#issuecomment-1570827592
Looks like our posts overlapped! Great to hear, I've offered to provide GPU access to further the work being done in https://github.com/ggerganov/llama.cpp/issues/1602 - will follow up as that progresses
There is now a working GGML example for 40B: https://github.com/ggerganov/ggml/pull/231
That's great! Maybe i will create a draft, but i would like to wait until it get's merged into ggml.
Working one here https://github.com/jploski/ggml/tree/falcon40b
Yeah, I noticed that. It would be great if someone could try porting it to Rust. I'm currently quite busy implementing GPU acceleration for all architectures.😬
Damn, was hoping editing the description would cancel out the issue-closing.
Anyhow - I've merged in the Falcon 7B implementation, but it doesn't handle 40B, and it requires 32-bit memory tensors as the repeat
operation it uses doesn't work with 16-bit tensors. Because of these caveats - and the continuing work on (one of) the original implementations in https://github.com/cmp-nct/ggllm.cpp - I've decided to merge it in, but disable it by default.
I'll keep this issue open until Falcon is truly ready to fly.
@LLukas22 should we close this or wait until the model format has stabilised?
We should wait until GGUF is implemented and we have all the necessary fields in the model file.
Similar to MPT, Falcon is Apache licensed, weights and all!
And according to the HuggingFace leaderboard it outperforms all current open source models including MPT.
It seems having a GGML conversion done of the model is a necessary precursor to having it included.
I don't think I have the expertise to do this but we may be able to help (e.g. can give access to a V100S or V100S to do the conversion)