rustformers / llm

[Unmaintained, see README] An ecosystem of Rust libraries for working with large language models
https://docs.rs/llm/latest/llm/
Apache License 2.0
6.07k stars 355 forks source link

Support Falcon #293

Open zcourts opened 1 year ago

zcourts commented 1 year ago

Similar to MPT, Falcon is Apache licensed, weights and all!

  1. https://huggingface.co/tiiuae/falcon-40b
  2. https://huggingface.co/tiiuae/falcon-40b-instruct

And according to the HuggingFace leaderboard it outperforms all current open source models including MPT.

It seems having a GGML conversion done of the model is a necessary precursor to having it included.

I don't think I have the expertise to do this but we may be able to help (e.g. can give access to a V100S or V100S to do the conversion)

LLukas22 commented 1 year ago

Already on it, got it converted and quantized but it produced gibberish. Im waiting on https://github.com/ggerganov/llama.cpp/issues/1602 to see how they will handle the Q, K, V weights. I dont want to create two seperate falcon-ggml ecosystems, so im waiting for the upstream ggml implementation.

zcourts commented 1 year ago

Ongoing discussion worth tracking here to get GG conversion https://github.com/ggerganov/llama.cpp/issues/1602

Found after posting this here. An attempt to convert has been made https://github.com/ggerganov/llama.cpp/issues/1602#issuecomment-1570827592

zcourts commented 1 year ago

Looks like our posts overlapped! Great to hear, I've offered to provide GPU access to further the work being done in https://github.com/ggerganov/llama.cpp/issues/1602 - will follow up as that progresses

KerfuffleV2 commented 1 year ago

There is now a working GGML example for 40B: https://github.com/ggerganov/ggml/pull/231

LLukas22 commented 1 year ago

That's great! Maybe i will create a draft, but i would like to wait until it get's merged into ggml.

iHaagcom commented 1 year ago

Working one here https://github.com/jploski/ggml/tree/falcon40b

LLukas22 commented 1 year ago

Yeah, I noticed that. It would be great if someone could try porting it to Rust. I'm currently quite busy implementing GPU acceleration for all architectures.😬

philpax commented 1 year ago

Damn, was hoping editing the description would cancel out the issue-closing.

Anyhow - I've merged in the Falcon 7B implementation, but it doesn't handle 40B, and it requires 32-bit memory tensors as the repeat operation it uses doesn't work with 16-bit tensors. Because of these caveats - and the continuing work on (one of) the original implementations in https://github.com/cmp-nct/ggllm.cpp - I've decided to merge it in, but disable it by default.

I'll keep this issue open until Falcon is truly ready to fly.

philpax commented 1 year ago

@LLukas22 should we close this or wait until the model format has stabilised?

LLukas22 commented 1 year ago

We should wait until GGUF is implemented and we have all the necessary fields in the model file.