Closed matthoffner closed 1 year ago
I'm also waiting for that PR to be merged. Hopefully it will be merged this weekend https://github.com/ggerganov/ggml/pull/231#issuecomment-1594411046
Thanks, do you know if it is possible to point ctransformers to a branch of ggml for testing?
+1 to this
However I don't think the ggml PR is the one to implement. Instead I would use the new implementation in ggllm.cpp: https://github.com/cmp-nct/ggllm.cpp
This is now the best Falcon GGML implementation, including CUDA GPU acceleration with support for both 7B and 40B models.
I don't know if this will end up also being in the GGML repo, or maybe even eventually the llama.cpp repo (as ggllm.cpp is a fork of llama.cpp).
But either way, this is the Falcon implementation of interest right now.
And I wonder whether there's even a need to wait for it to be fully stable? It's already useful and being used by people. I have four Falcon GGML repos now:
If ctransformers supported this I think it would help accelerate the use of Falcon GGML.
@matthoffner It is not possible to point ctransformers to a branch of ggml as the model code has to be modified to integrate into the common interface I provide for all models.
Thanks @TheBloke I was waiting for the PR to be merged but since you are already providing the files, I added experimental support for Falcon models using the ggllm fork in the latest version 0.2.10
It has CUDA support similar to the LLaMA models. I tested with 7B model but my machine doesn't have enough memory for the 40B model.
Fantastic! That's great news, thank you marella. That was super quick.
I will update my READMEs to mention this.
@ParisNeo could you check if this works automatically in LoLLMS, and if so maybe add some Falcon GGML entries? Then I will mention also in the README, and you will be the first UI to support Falcon GGML! :)
https://discord.com/channels/@me/1097295255801442306/1121584559138545704
I am using 0.2.10.
Am I missing something?
You should use model_type="falcon"
:
llm = AutoModelForCausalLM.from_pretrained("TheBloke/falcon-7b-instruct-GGML", model_type="falcon")
@marella @TheBloke Thank you!! I think I've got the 40B with GPU on a HF space:
I've added config.json to my four repos so manual model_type selection by the client shouldn't be needed from now on
Thank you very much for this nice work. Tested the 7B model on my pc and it is really solid even compared to 13B from other models. @marella do you have a twitter account so i can follow you and put a link when i credit your work ?
Yeah i was wondering about that too
Hey, I don't have a twitter account. I'm on LinkedIn https://www.linkedin.com/in/ravindramarella/ but I don't post anything there. If you want to link, you can just link to this repo.
Ok. Very nice profile by the way. Nice to meet you.
I've been tracking the Falcon https://github.com/ggerganov/ggml/pull/231 PR, and as I understand currently it won't work on a released version of
ggml
.Any suggestions on how to test it config wise are appreciated, I'm assuming
llama
might not work based on other PRs.