marella / ctransformers

Python bindings for the Transformer models implemented in C/C++ using GGML library.
MIT License
1.79k stars 135 forks source link

Support Microsoft Guidance #13

Open vmajor opened 1 year ago

vmajor commented 1 year ago

I am trying to use a 'custom tokenizer' but I am unable to see how can I invoke it. Also can we use a standard tokenizer from HF by pulling it or loading from the local path?

vmajor commented 1 year ago

Never mind, I can just load it from transformers...

marella commented 1 year ago

Yes, custom/HF tokenizer can be used with the generate() method:

from ctransformers import AutoModelForCausalLM
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('gpt2')
llm = AutoModelForCausalLM.from_pretrained('marella/gpt-2-ggml')

tokens = tokenizer.encode('AI is going to')

for token in llm.generate(tokens):
    print(tokenizer.decode(token))

Please let me know if you were able to use this library with your custom/HF tokenizer.

vmajor commented 1 year ago

I am actually trying to use ctransformers with Microsoft guidance, but I am encountering an error with protobuf. I posted a bug report there, but I am unsure what is happening.

EDIT, changed the code as it was a massive brainfart...

import guidance
from ctransformers import AutoModelForCausalLM
from transformers import LlamaTokenizer
from transformers import AutoTokenizer

# we will use LLaMA for most of the examples in this tutorial
path = '/home/vmajor/models/gpt4-alpaca-lora_mlp-65B'
llm = AutoModelForCausalLM.from_pretrained('/home/vmajor/models/gpt4-alpaca-lora_mlp-65B/gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_1.bin', model_type='llama')
tokenizer = LlamaTokenizer.from_pretrained('/home/vmajor/models/llama-tokenizer-65b/tokenizer.model')
#print(llm('AI is going to'))
guidance.llm = guidance.llms.transformers.LLaMA(llm, tokenizer, device="cpu")
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[16], line 11
      9 tokenizer = LlamaTokenizer.from_pretrained('/home/vmajor/models/llama-tokenizer-65b/tokenizer.model')
     10 #print(llm('AI is going to'))
---> 11 guidance.llm = guidance.llms.transformers.LLaMA(llm, tokenizer, device="cpu")

File [~/anaconda3/envs/guidance/lib/python3.10/site-packages/guidance/llms/_transformers.py:42](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/vmajor/guidance/notebooks/~/anaconda3/envs/guidance/lib/python3.10/site-packages/guidance/llms/_transformers.py:42), in Transformers.__init__(self, model, tokenizer, caching, token_healing, acceleration, temperature, device, **kwargs)
     40 self.acceleration = acceleration
     41 if device is not None: # set the device if requested
---> 42     self.model_obj = self.model_obj.to(device)
     43 self.device = self.model_obj.device # otherwise note the current device
     45 self._prefix_ids = [self._tokenizer.bos_token_id, 100] # token ids that we use to decode tokens after a prefix

File [~/anaconda3/envs/guidance/lib/python3.10/site-packages/ctransformers/llm.py:197](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/vmajor/guidance/notebooks/~/anaconda3/envs/guidance/lib/python3.10/site-packages/ctransformers/llm.py:197), in LLM.__getattr__(self, name)
    195 if name.startswith('ctransformers_llm_') and hasattr(lib, name):
    196     return partial(getattr(lib, name), llm)
--> 197 raise AttributeError(f"'LLM' object has no attribute '{name}'")

AttributeError: 'LLM' object has no attribute 'to'

So, now the error is with ctransformers, but again I am uncertain if it is a real error, or just due to me trying something rather strange: mixing model with tokenizer that may not be the correct one, and trying to plug ctransformers and transformers implementations into guidance().

marella commented 1 year ago

I haven't used the guidance library but the guidance.llms.transformers.LLaMA class is expecting HF transformers object but you are passing ctransformers object, so it won't work. It looks like there is already an issue opened to support ggml models https://github.com/microsoft/guidance/issues/58

vmajor commented 1 year ago

Yes, I saw that thread, but progress slowed down. I would really like the ability to leverage open source community efforts with what comes out of commercial, or well funded (eg HF) groups. ctransformers to me sounds like a great way to achieve that, but perhaps I am misunderstanding the drivers and philosophy behind it. Ideally I would love to be able to do what the other people in the thread were saying, just drop in a local quantized model in place of HF hosted model. This way us independent users, developers and tinkerers can plug into the much better resourced projects.

marella commented 1 year ago

I would also like to add support for it but it doesn't seem to have documentation on how to add new models. I will try to follow this example and see if I can make it work. I will look into it next weekend.

bluecoconut commented 1 year ago

@marella Any update on this? I'm looking forward to using StarChat-ggml weights in guidance via ctransformers~

I will take a stab at this later this week, but I don't want to repeat work, especially given the assumption that you might have spent time on this already. Were there any gotchas or difficulties that I could maybe help with?

marella commented 1 year ago

Hey, I implemented a 🤗 Transformers compatible model and tokenizer using ctransformers and was able to run one of the examples but I think it has some bugs. I will push the code to GitHub later this week and will let you know. I'm trying to make it work like this:

model = # ctransformers model
tokenizer = # ctransformers tokenizer
llm = guidance.llms.Transformers(model=model, tokenizer=tokenizer)
bluecoconut commented 1 year ago

Hey @marella, any update on this? I like the idea of having a transformers compatible model and tokenizer object for using ctransformers. I'd love to try it out, if you make a public branch with your work (even with the bugs), I can jump off from your starting point and see where it fails for me and offer some fixes if that's helpful.

marella commented 1 year ago

Hi, I pushed the changes to guidance branch. You can install using:

git clone https://github.com/marella/ctransformers
cd ctransformers
git checkout guidance
pip install -e .

and use it as:

import guidance
from ctransformers import AutoModelForCausalLM
from ctransformers.transformers import Model, Tokenizer

llm = AutoModelForCausalLM.from_pretrained("marella/gpt-2-ggml")
tokenizer = Tokenizer(llm)
model = Model(llm)
llm = guidance.llms.Transformers(model=model, tokenizer=tokenizer)

I fixed some of the bugs. It is working (finishing without errors) with guidance 0.0.61 but getting stuck in latest version.

qeternity commented 1 year ago

Thanks very much for this @marella - for the life of me though I can't figure out why it hangs around 450 tokens?

qeternity commented 1 year ago

Ok nevermind, it appears to be an issue with ctransformers_llm_context_length returning an incorrect 512 for a llama model. I've overridden and everything is working now.

In doing some benchmarking llm.eval is much slower than llama.cpp

kw2828 commented 1 year ago

@marella thanks for putting a branch together. I quickly tried to put together a prototype with it with your HF model above marella/gpt-2-ggml and guidance==0.0.61 but it's hanging around _stream_then_save. Any thoughts / tweaks I can make?

Here's a colab to reproduce: https://colab.research.google.com/drive/1YzBvp97pLwAdfl7tlKwCtYH2DZhigXiI?usp=sharing

Jchang4 commented 1 year ago

@marella seconded! This would be killer

lucasjinreal commented 1 year ago

Hello, would like address some decode on Japanese character with wrong when decode in streaming. Does anybody can help how to fix it? (this happened because of llama tokenizer vocab size small, Japanese characters need more than one token to decode correctly)

for example: ��当时年少,��

barinov274 commented 10 months ago

Hi, I pushed the changes to guidance branch. You can install using:

git clone https://github.com/marella/ctransformers
cd ctransformers
git checkout guidance
pip install -e .

and use it as:

import guidance
from ctransformers import AutoModelForCausalLM
from ctransformers.transformers import Model, Tokenizer

llm = AutoModelForCausalLM.from_pretrained("marella/gpt-2-ggml")
tokenizer = Tokenizer(llm)
model = Model(llm)
llm = guidance.llms.Transformers(model=model, tokenizer=tokenizer)

I fixed some of the bugs. It is working (finishing without errors) with guidance 0.0.61 but getting stuck in latest version.

mistral doesn't work with guidance Model type 'mistral' is not supported.