Just use huggingface - Githubissues

shawwn / llama-dl

High-speed download of LLaMA, Facebook's 65B parameter GPT model

GNU General Public License v3.0

4.17k stars 420 forks source link

Just use huggingface #6

Open dustydecapod opened 1 year ago

dustydecapod commented 1 year ago

All of the models are on huggingface already. https://huggingface.co/decapoda-research

there's even an open, working pr to add support to the transformers lib.

shawwn commented 1 year ago

Sure, use whatever works. This repo is intended to serve as a point of communication about llama, and also as an extra mirror.

Note that Facebook has been issuing takedown requests against huggingface llama repositories, so those may get knocked offline.

loretoparisi commented 1 year ago

All of the models are on huggingface already. https://huggingface.co/decapoda-research

there's even an open, working pr to add support to the transformers lib.

It's worth to note that those models files have been converted to be used in the HF library, so if we take the 7B models files here

According the authors the model has been infact

LLaMA-7B converted to work with Transformers/HuggingFace. This is under a special license, please see the LICENSE file for details.

So supposed we want to use model's file in C++ inference here I'm not sure if would work.

tljstewart commented 1 year ago

@loretoparisi Ya i'm thinking along the same lines and trying to make sense here. There are 8bit and 4bit quantized, the original and the huggingface versions... I think C++ inference use the original weights and converts them, to ggml format the authors own format and also does the quantization...?

Can this be confirmed?

Also, I am currently using ipfs downloading current time is 2d9h42m for 65B.... as the magnet link in this repo seems to be down as well as huggingface...

Any thoughts on the model formats with C++ or a way to download the weights faster?

loretoparisi commented 1 year ago

yes confirmed. You first convert weights to ggml FP16 or FP32, then quantize to 4bit and run inference (cpu only).

tljstewart commented 1 year ago

Ah ok, so your suppose to get the original released weights and the C++ code converts it? Also I found an original weight torrent link and its going extremely fast, ETA 3hour for 235GB.

webtorrent download o8a7xw.torrent

loretoparisi commented 1 year ago

yes this is exactly what I did from the download here.

risos8200 commented 1 year ago

You can also use https://huggingface.co/huggyllama, works with llama.cpp.