Closed briandw closed 6 months ago
I got it working. The main issue what that DeepSeek was using LlamaLinearScalingRotaryEmbedding. I added a scaling factor to the precompute_freqs_cis function and it works! I had to replace the tokenizer and change a few details relating to that. If I could figure out how to convert the Llama tokenizer to a SentencePiece model file I think that the scaling factor would be the only change needed.
Doesn't anyone know how to go about converting the Llama tokenize to a SentencePiece?
@briandw The tokenizer interface is pretty simple. https://github.com/pytorch-labs/gpt-fast/blob/main/generate.py#L201
Basically, given a tensor of integers, you need a way of converting it to a string. And given the string, you need a way of converting it to a tensor of integers.
I also think the current tokenizer is already a SentencePieceProcessor? https://github.com/pytorch-labs/gpt-fast/blob/main/generate.py#L285
@Chillee Thanks for your response. I understand that it's just tokens to id, but I was hoping to be able to use the model without code changes. LLamaTokeizer isn't an exact replacement, but it's pretty close. I'm just going to use and remove sentence piece and use the LLamaTokenizer.
This is an amazing project and it would be great to support other models.
I've been looking at using deepseek with gpt-fast. Deepseek is in the Llama2 family. I've gotten as far as converting the model and replacing the tokenizer. I can run the model, but the output isn't correct. I think that there are some differences in architecture, but I can't tell if they are a problem.
I think I have the correct parameters: "deepseek-coder-6.7b-base":dict(block_size=16384, vocab_size=32256, intermediate_size=11008, norm_eps=1e-6, rope_base = 100000)
So the model coverts and runs but the output is gibberish. Could there be something wrong in the conversion step. I can't tell what all the key mapping is for, so I don't know if that's working correctly.
Any suggestions on what to do next?