Closed yardenas closed 6 months ago
As an outsider also highly interested in seeing this conversion available in the library before I commit to train anything with Nanotron. Just here to cheer you on!
I need it for another project that uses nanotron and was wondering if it is something that you'd want in this repository
Yes.
Do you think that a rather small model (that I can quickly iterate on while running locally) would be sufficient?
Yup. That would be great!!
Where should the conversion script be located?
/tools
? Feel free to place wherever you ike... we could change it later on
Looks very nice. Please ping me once it's ready!!
@xrsrke I think we're getting there. @AleHD made a significant progress making the tests actually pass -- we currently get ~0.02 absolute error on the logits in both directions.
We made a copy of examples/doremi/tests/utils.py
to the llama
folder and made some modifications. It's testing utils for llama, so I wouldn't put it inside the nanotron library. That being said, it's not super DRY so happy to make changes if you think there's a better way
wtyd?
@yardenas very cool. feel free to ping me if you need any pointers 🤗
We made a copy of examples/doremi/tests/utils.py to the llama folder and made some modifications. It's testing utils for llama, so I wouldn't put it inside the nanotron library. That being said, it's not super DRY so happy to make changes if you think there's a better way
The test looks good 🙌
@xrsrke -- seems to be ready on our side :)
@yardenas Given a pretrained HF model (huggyllama/llama-7b
), I tested the following:
check_converted_model_generation()
in convert_nanotron_to_hf.py
)
@yardenas Given a pretrained HF model (
huggyllama/llama-7b
), I tested the following:
- Convert HF to nanotron
- Convert back Nanotron to HF
Run generate in HF (through
check_converted_model_generation()
inconvert_nanotron_to_hf.py
)
- [x] with cache
- [ ] no cache (<== doesn't yield proper results)
Run generate in Nanotron
- [ ] with cache
- [x] no cache
got it. any ideas why it could fail for the no cache option? I'll take a look into it
@yardenas Given a pretrained HF model (
huggyllama/llama-7b
), I tested the following:
- Convert HF to nanotron
- Convert back Nanotron to HF
Run generate in HF (through
check_converted_model_generation()
inconvert_nanotron_to_hf.py
)
- [x] with cache
- [ ] no cache (<== doesn't yield proper results)
Run generate in Nanotron
- [ ] with cache
- [x] no cache
got it. any ideas why it could fail for the no cache option? I'll take a look into it
Mhmm
if I test it this way, generation is the same
def check_converted_model_generation(save_path: Path):
"""Loads a huggingface model and tokenizer from `save_path` and
performs a dummy text generation."""
tokenizer = AutoTokenizer.from_pretrained(save_path)
input_ids = tokenizer(TEST_PROMPT, return_tensors="pt")["input_ids"].cuda()
print("Inputs:", tokenizer.batch_decode(input_ids))
model = LlamaForCausalLM.from_pretrained(save_path).cuda().bfloat16()
out = model.generate(input_ids, max_new_tokens=100)
print("Generation (converted): ", tokenizer.batch_decode(out))
model_nocache = LlamaForCausalLM.from_pretrained(save_path).cuda().bfloat16()
model_nocache.config.use_cache = False
out_nocache = model_nocache.generate(input_ids, max_new_tokens=100)
print("Generation (converted, no cache): ", tokenizer.batch_decode(out_nocache))
However, if after step Convert back Nanotron to HF
, I manually change use_cache=False
in ckpt/model_config.json
before step Run generate in HF (through check_converted_model_generation() in convert_nanotron_to_hf.py)
, then the generation is not good (maybe I am not supposed to do that)
@yardenas Also, have you tried training a llama in Nanotron in DP=PP=1 & TP=2
and run convert_nanotron_to_hf.py
?
@3outeille
config.yaml
dumping, as requested.@yardenas Also, have you tried training a llama in Nanotron in DP=PP=1 & TP=2and run convert_nanotron_to_hf.py ?
Will add a test for this case now
@3outeille we added a fix for the tp=2 case. :innocent:
@3outeille any updates? :)
@xrsrke and @3outeille I just committed the requests from @xrsrke, anything else? :)
A similar idea as in https://github.com/huggingface/nanotron/pull/103, but for a Llama model.
I'd be happy to implement this.
I need it for another project that uses nanotron and was wondering if it is something that you'd want in this repository? If so, I'll start working on an implementation here.
Aside from the contribution guide, are there any other guidelines for this task? For example:
Where should the conversion script be located? Any gotcha's I should be aware of? The best way to validate this would be to write a test that shows that the converted models return the same results as the non-converted ones. Do you think that a rather small model (that I can quickly iterate on while running locally) would be sufficient? Thanks!