michaelfeil / hf-hub-ctranslate2

Connecting Transformers on HuggingFace Hub with CTranslate2
https://michaelfeil.github.io/hf-hub-ctranslate2/
MIT License
32 stars 2 forks source link

There's some kind of problem with the models that michaelfeil converts to ctranslate2 format. Sorry... #14

Closed BBC-Esq closed 10 months ago

BBC-Esq commented 1 year ago

It keeps giving me an error message of:

OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory C:\LLMs\bge-base-en-ct2-int8_float16.

It gives me this after the model has been converted to the ctranslate2 format. I've tried converting the original model myself as well as cloning the repo that michaelfeil made regarding same exact model (after he converted it). I've tried downloading the full size an then converting it with his exact command, nothing work!!! Also, I noticed that some of his commands make the output a temporary file??? What?

Struggled 3-4 hours today because it's crucial for my project based on ctranslate2...

After receiving the above error, I even renamed model.bin to pytorch_model.bin and got a new error:

OSError: Unable to load weights from pytorch checkpoint file for 'C:\LLMs\bge-base-en-ct2-int8_float16\pytorch_model.bin' at 'C:\LLMs\bge-base-en-ct2-int8_float16\pytorch_model.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

I will pay someone $100, seriously, or whatever, if you solve this issue for me. I have a slew or embedding models that I need to use for embedding purposes, and it's crucial for my program that I use ctranslate2 instead of the full size for vram reasons...

One last thing, to make sure I wasn't going crazy, I downloaded one of Michael's llama2-13b converted to ctranslate2...it didn't work until I went to the original repository of the full-size model, downloaded the original "tokenizer.model" and put it in the folder for his quantized/ctranslate2 model. Then it worked. I knew this from converting my own models (another 3-4 hours, long story), but anyways...this is why GGML and GPTQ are insanely more popular even though ctranslate2 is FAR SUPERIOR in my humble opinion...have you seen the whisper benchmarks? I can additionally verify this because I'm starting to use the inference models (e.g. llama2) for chat purposes locally. Community is near non-existent though...Guillikian is sparse...hardly anyone is working with this superior technology.

Anyhow, grateful that Michael created hf_hub_ctranslate2, it's a godsend...but c'mon...any help from any of the community members???

michaelfeil commented 1 year ago

hi @BBC-Esq, I am not sure what the problem is here:

I assume you are using a model like https://huggingface.co/BAAI/bge-base-en for embedding task, just like any other sentence embedding.

from hf_hub_ctranslate2 import CT2SentenceTransformer
model_name_pytorch = "intfloat/e5-small"
model = CT2SentenceTransformer(
    model_name_pytorch, compute_type="int8", device="cuda", 
)
embeddings = model.encode(
    ["I like soccer", "I like tennis", "The eiffel tower is in Paris"],
    batch_size=32,
    convert_to_numpy=True,
    normalize_embeddings=True,
)
print(embeddings.shape, embeddings)
scores = (embeddings @ embeddings.T) * 100

Generally, models which have the BertModel architecture should be able to do this. https://huggingface.co/BAAI/bge-base-en/blob/main/config.json

BBC-Esq commented 1 year ago

hi @BBC-Esq, I am not sure what the problem is here:

* `pytorch_model.bin` should not be needed to run. Once the model is converted with ctranslate2, pytorch is not needed to run the forward pass.

I assume you are using a model like https://huggingface.co/BAAI/bge-base-en for embedding task, just like any other sentence embedding.

from hf_hub_ctranslate2 import CT2SentenceTransformer
model_name_pytorch = "intfloat/e5-small"
model = CT2SentenceTransformer(
    model_name_pytorch, compute_type="int8", device="cuda", 
)
embeddings = model.encode(
    ["I like soccer", "I like tennis", "The eiffel tower is in Paris"],
    batch_size=32,
    convert_to_numpy=True,
    normalize_embeddings=True,
)
print(embeddings.shape, embeddings)
scores = (embeddings @ embeddings.T) * 100

Generally, models which have the BertModel architecture should be able to do this. https://huggingface.co/BAAI/bge-base-en/blob/main/config.json

I will try this out.