Open wiseyy opened 8 months ago
@NouamaneTazi do we have a conversion script from transformers
to nanotron checkpoint?
Any updates? @xrsrke
@wiseyy I'm facing a similar challenge. Any way we can join forces on this and try to make it work? :)
Glad to know I'm not alone :)
I already chose the easier route to use Megatron-LLM and Meditron. The training throughput, however, is ~2/3 of what nanotron provides. Also, you would have to convert the weights to hf format after you finish training and infer using hf/vllm.
I hope that helps you.
@wiseyy unfortunately I can't go the megatron route (I'm part of a group and we already committed ourselves to nanotron).
Conversion is straightforward
Can you help me get started with this? Maybe if I can reproduce your errors I'll be able to dig deeper into the issue
Hey all, I was wondering if there are any conversion scripts yet?
hello. you could use this https://github.com/huggingface/nanotron/tree/main/examples/llama
I've noticed the test about consistent logits are commented out for the above conversion scripts: https://github.com/huggingface/nanotron/blob/03d67f2103d5be0dc15ea6022a6cf16d6a633064/examples/llama/tests/test_conversion.py#L223
Also running into this problem of differing logits - any potential solutions?
Also, given a nanotron checkpoint, how do we continue training on it? The above examples only show loading a model for inference but not for continued training. DistributedTrainer
only takes in config files in its __init__
function - how do we modify it with an in-memory model (ideally the converted Llama model) and tokenizer?
In continuation to https://github.com/huggingface/nanotron/issues/78#issue-2147747937,
I converted the weights as you mentioned, but unfortunately, I cannot get the same sane outputs for the pre-trained llama weights as I get when using HF Api. I am trying to figure out why that is happening. Conversion is straightforward except for the gate_up and qkv weights of nanotron, since the structure of the weights is not mentioned. I assume that concatenating the hf weights in the 0th dimension in the order (gate, up) and (q,k,v) should give the same behaviour for nanotron weights.
The sources of errors I could think of are (assuming there is no bug in run_generate.py):
Could you please help me out?
Update : The outputs look somewhat sane. However, they are far from acceptable.
Here, for example, it tries to speak but then moves on to generating gibberish. This leads me to believe that the weight mapping is correct and that there is some error in the Generation code.
I want to point out that you are not passing the arguments to the sampler in the
decode_text
function in generation/decode.py.The above outputs were generated using decode_tokenized(), which does that. The GenerationArgs were as follows:
The output that HF API generates for the same weights and input tokens is as follows:
The quality is a lot better than the text generated by nanotron.
Also, when I try to prompt the 7b-chat version with a system prompt and user input (the default way), nanotron output breaks altogether.
This is HF->
This is nanotron->