Closed abuelnasr0 closed 1 month ago
Phi3Backbone is now ready to be reviewed and merged!
I have checked numerics for both models phi3_mini_4k_instruct_en
and phi3_mini_128k_instruct_en
and it produce good mean differnce between the two outputs for float32 and float16 ( i.e. 5.2247e-08 ), but for bfloat16 it produces a less quality result (i.e. -0.0004). I think this is normal, isn't it?
here is a notebook with the script run on the two models https://www.kaggle.com/code/mohamedabuelnasr/phi3-keras-conversion
Thanks for the work on this @abuelnasr0, this is awesome!
I have checked numerics for both models
phi3_mini_4k_instruct_en
andphi3_mini_128k_instruct_en
and it produce good mean differnce between the two outputs for float32 and float16 ( i.e. 5.2247e-08 ), but for bfloat16 it produces a less quality result (i.e. -0.0004). I think this is normal, isn't it?
Yes, that's pretty good. Are these absolute tolerence values or relative? Either way, it's below the machine precision of 32-bit and 16-bit floating values so they definitely match!
@tirthasheshpatel
Are these absolute tolerence values or relative?
It was supposed to calculate the absolute difference, but I took a look again at the function and I found that I was calculating the mean of the difference not the absolute difference. I have corrected it and run the conversion script again. and the results become worse but acceptable for float32 (3.0725e-06), but for bfloat it is bad (0.0254 for 128k model and 0.0469 for 4k model). and for float16 (0.0046).
The model is ready now. And the output matches the huggingface output for the two models.
but there is a problem with the tokenizer described here:
text = "<|user|>\nHow to win?<|end|>\n<|assistant|>"
# the output after adding special_tokens as user_defined_symbols to the sentence_piece model.
keras_nlp : [1, 29871, 32010, 13, 5328, 304, 5401, 29973, 32007, 13, 32001]
# same as keras but without adding '▁' at the beginning. can be configured in the spm model.
llama_cpp : [1, 32010, 13, 5328, 304, 5401, 29973, 32007, 13, 32001]
# Removes '\n' (LF token) completly.
# Adds '▁' at the beginning (If text starts with non-special token) and after each special token.
hf_fast_tok: [1, 32010, 1128, 304, 5401, 29973, 32007, 32001]
# Removes '\n' (LF token) completly. Adds '▁' at the beginning.
# Same as keras but if the text doesn't contain '\n'.
hf_tok : [1, 29871, 32010, 5328, 304, 5401, 29973, 32007, 32001]
The huggingface output should match the sentencepiece output after adding the special tokens, but huggingface handles special tokens outside the sentencepiece library, that's why output doesn't much.
LlamaTokenizer
and LlamaFastTokenizer
in huggingface aren't consistent. But if we have to match huggingface output we should try to match LlamaFastTokenizer
as it is used in the example of the official model page, but we will do a lot of work around, for example like here https://github.com/keras-team/keras-nlp/pull/1445
NOTE: The generation match in the photos because I used a text that is tokenized the same in keras_nlp and huggingface using LlamaTokenizer
Copying the presets over now on Kaggle. I will pull this in today.
Updated links, though I think things are still processing Kaggle side.
This PR adds phi3 model. The PR is not ready for merge yet I need to:
Phi3SuScaledRotaryEmbedding
formicrosoft/Phi-3-mini-128k-instruct
EDIT: It's ready for review now.