Open sedrick-keh-tri opened 1 year ago
Discussed with @ruixin31 and @sedrick-keh-tri offline, summarizing here: this is generally something we'd like to have. The only question is one of timing and priority: we're improving openlm rapidly (e.g. #74), so we may want to put off integrating into HF to reduce maintenance effort. @sedrick-keh-tri will look into this and add it if it's easy, otherwise we'll punt until later this year.
Implemented here: https://github.com/sedrick-keh-tri/transformers
Steps:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("TRI-ML/openlm-1b")
model = AutoModelForCausalLM.from_pretrained("TRI-ML/openlm-1b").to("cuda")
a = tokenizer("hi", return_tensors="pt")
out = model.generate(a['input_ids'].to("cuda"), max_length=60, do_sample=False)
print(tokenizer.decode(out[0]))
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("TRI-ML/openlm-7b-code")
model = AutoModelForCausalLM.from_pretrained("TRI-ML/openlm-7b-code").to("cuda")
a = tokenizer("def find_most_common(arr)", return_tensors="pt")
out = model.generate(a['input_ids'].to("cuda"), max_length=60, do_sample=False)
print(tokenizer.decode(out[0]))
Note: This is an unofficial implementation, so we aren't merging it with the HF transformers repo right now. If OpenLM wants to eventually release models, I would be in favor of integrating with HF then.
Note to self (and to future OpenLM folks who want to work on this):
x
is fed into each layer (open_lm Block), the shape of x
is (bsz, seq_len, hidden_dim). https://github.com/mlfoundations/open_lm/blob/e01685554b04624ffdb2d86c1970485232c14e9f/open_lm/model.py#L284 Meanwhile, in HF the shape of x
is (bsz, 1, hidden_dim). As a result, I had to also cache Q rather than just caching K and V (see this commit). Caching Q allows us to reconstruct the Q when we pass it to xformersTesting:
Some other things we want to consider/fix in for future release:
Hi OpenLM team! Is there interest in making OpenLM models loadable using just HF?
I see some OpenLM models up on HF, but they are not readily loadable using HF. The proposed changes would involve adding an OpenLM class on HF, similar to how other models are hosted on HF (e.g. Mistral).
For comparison, both #54 and #20 allow saved OpenLM models to be loaded using HF functions, but under the hood it still calls OpenLM functions and requires the OpenLM library downloaded locally. What I'm thinking is basically porting OpenLM's model.py into the transformers library itself, so that OpenLM trained models can be shared and loaded more easily. I can work on this if you think it's a good idea.
@mitchellnw @sagadre @achalddave