serge-chat / serge

A web interface for chatting with Alpaca through llama.cpp. Fully dockerized, with an easy to use API.
https://serge.chat
Apache License 2.0
5.68k stars 405 forks source link

Model wishlist #217

Closed nsarrazin closed 1 year ago

nsarrazin commented 1 year ago

Hey everyone!

Just opening an issue to track which models people would like to see supported with Serge.

Are there any others you would like to see ?

morpheus2448 commented 1 year ago

My list:

Black-Engineer/llama-13b-pretrained-sft-do2-ggml-q4.bin Black-Engineer/oasst-llama13b-ggml-q4.bin Black-Engineer/oasst-llama-13b.bin Black-Engineer/oasst-llama-30b.bin eachadea/ggml-vicuna-13b-1.1-q4_1.bin eachadea/ggml-vicuna-13b-1.1-q4_2.bin eachadea/ggml-vicuna-13b-1.1-q4_3.bin eachadea/ggml-vicuna-7b-1.1-q4_0.bin eachadea/ggml-vicuna-7b-1.1-q4_1.bin eachadea/ggml-vicuna-7b-1.1-q4_2.bin

(Want to try these but not currently supported by llama.cpp afaik:

mongolian-basket-weaving/oasst-stablelm-7b-sft-v7-epoch-3-ggml-q4_2.bin mongolian-basket-weaving/oasst-stablelm-7b-sft-v7-epoch-3-ggml-q4_3.bin

Also, models released by The Bloke are worth supporting!)

morpheus2448 commented 1 year ago

My list:

Black-Engineer/llama-13b-pretrained-sft-do2-ggml-q4.bin Black-Engineer/oasst-llama13b-ggml-q4.bin Black-Engineer/oasst-llama-13b.bin Black-Engineer/oasst-llama-30b.bin eachadea/ggml-vicuna-13b-1.1-q4_1.bin eachadea/ggml-vicuna-13b-1.1-q4_2.bin eachadea/ggml-vicuna-13b-1.1-q4_3.bin eachadea/ggml-vicuna-7b-1.1-q4_0.bin eachadea/ggml-vicuna-7b-1.1-q4_1.bin eachadea/ggml-vicuna-7b-1.1-q4_2.bin

(Want to try these but not currently supported by llama.cpp afaik:

mongolian-basket-weaving/oasst-stablelm-7b-sft-v7-epoch-3-ggml-q4_2.bin mongolian-basket-weaving/oasst-stablelm-7b-sft-v7-epoch-3-ggml-q4_3.bin

Also, models released by The Bloke are worth supporting!)

kolabearafk commented 1 year ago

How about WizardLM (https://github.com/nlpxucan/WizardLM) ? Can this new model be supported in Serge please? I heard it's very fast and generates good results.

CrazyBonze commented 1 year ago

I was going to say WizardLM, its been getting some good reviews.

OrcVole commented 1 year ago

https://huggingface.co/reeducator/vicuna-13b-free/discussions

It tries to reduce censorship

fishscene commented 1 year ago

Google PaLM looks interesting. https://9to5google.com/2023/05/10/google-palm-2/

However, as far as I can tell, the framework is open source and might be available? But I can’t find any info on the training data. I’m also very new to AI and I might be looking in the wrong places for the data needed to be usable for Serge.

rendel commented 1 year ago

MPT-7B: https://github.com/mosaicml/llm-foundry

noproto commented 1 year ago

I'd really like to see support for Guanaco models. Edit: PR https://github.com/nsarrazin/serge/pull/334

ethdig commented 1 year ago

https://huggingface.co/medalpaca https://huggingface.co/medalpaca/medalpaca-13b/tree/main https://huggingface.co/ehartford/Wizard-Vicuna-30B-Uncensored/tree/main

Thank you

amiravni commented 1 year ago

https://huggingface.co/timdettmers/guanaco-33b-merged https://huggingface.co/tiiuae/falcon-40b

Thanks!

Betanu701 commented 1 year ago

from my testing, it seems like all the Q4_0 and the Q8_0 work the best/fastest. The K varieties take too long with CPU only. Honestly I think it is a toss up between Q4 and Q8. Both seem to run about the same within a margin of error.

wuast94 commented 1 year ago

https://huggingface.co/philschmid/instruct-igel-001

specked commented 1 year ago

CodeGen2.5

kagrith commented 1 year ago

from my testing, it seems like all the Q4_0 and the Q8_0 work the best/fastest. The K varieties take too long with CPU only. Honestly I think it is a toss up between Q4 and Q8. Both seem to run about the same within a margin of error.

I get this odd error when trying to run Wizard.

llama.cpp: loading model from /usr/src/app/weights/Wizard-30B-Q4_1.bin
error loading model: unknown (magic, version) combination: 67676a74, 00000003; is this really a GGML file?
llama_init_from_file: failed to load model

Did you get this? How did you resolve?

tonyhardcode commented 1 year ago

WizardCoder would be nice to have!

https://huggingface.co/WizardLM/WizardCoder-15B-V1.0

laurentgoncalves commented 1 year ago

Very interested to have the just released code llama : https://ai.meta.com/blog/code-llama-large-language-model-coding/

gaby commented 1 year ago

Fixed via #866