pytorch-labs / gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
BSD 3-Clause "New" or "Revised" License
5.36k stars 485 forks source link

What would it take to support other models like deepseek coder? #35

Closed briandw closed 6 months ago

briandw commented 7 months ago

This is an amazing project and it would be great to support other models.

I've been looking at using deepseek with gpt-fast. Deepseek is in the Llama2 family. I've gotten as far as converting the model and replacing the tokenizer. I can run the model, but the output isn't correct. I think that there are some differences in architecture, but I can't tell if they are a problem.

I think I have the correct parameters: "deepseek-coder-6.7b-base":dict(block_size=16384, vocab_size=32256, intermediate_size=11008, norm_eps=1e-6, rope_base = 100000)

So the model coverts and runs but the output is gibberish. Could there be something wrong in the conversion step. I can't tell what all the key mapping is for, so I don't know if that's working correctly.

Any suggestions on what to do next?

briandw commented 6 months ago

I got it working. The main issue what that DeepSeek was using LlamaLinearScalingRotaryEmbedding. I added a scaling factor to the precompute_freqs_cis function and it works! I had to replace the tokenizer and change a few details relating to that. If I could figure out how to convert the Llama tokenizer to a SentencePiece model file I think that the scaling factor would be the only change needed.

Doesn't anyone know how to go about converting the Llama tokenize to a SentencePiece?

Chillee commented 6 months ago

@briandw The tokenizer interface is pretty simple. https://github.com/pytorch-labs/gpt-fast/blob/main/generate.py#L201

Basically, given a tensor of integers, you need a way of converting it to a string. And given the string, you need a way of converting it to a tensor of integers.

I also think the current tokenizer is already a SentencePieceProcessor? https://github.com/pytorch-labs/gpt-fast/blob/main/generate.py#L285

briandw commented 6 months ago

@Chillee Thanks for your response. I understand that it's just tokens to id, but I was hoping to be able to use the model without code changes. LLamaTokeizer isn't an exact replacement, but it's pretty close. I'm just going to use and remove sentence piece and use the LLamaTokenizer.