mlfoundations / open_lm

A repository for research on medium sized language models.
MIT License
463 stars 62 forks source link

HF Integration #89

Open sedrick-keh-tri opened 9 months ago

sedrick-keh-tri commented 9 months ago

Hi OpenLM team! Is there interest in making OpenLM models loadable using just HF?

I see some OpenLM models up on HF, but they are not readily loadable using HF. The proposed changes would involve adding an OpenLM class on HF, similar to how other models are hosted on HF (e.g. Mistral).

For comparison, both #54 and #20 allow saved OpenLM models to be loaded using HF functions, but under the hood it still calls OpenLM functions and requires the OpenLM library downloaded locally. What I'm thinking is basically porting OpenLM's model.py into the transformers library itself, so that OpenLM trained models can be shared and loaded more easily. I can work on this if you think it's a good idea.

@mitchellnw @sagadre @achalddave

achalddave commented 9 months ago

Discussed with @ruixin31 and @sedrick-keh-tri offline, summarizing here: this is generally something we'd like to have. The only question is one of timing and priority: we're improving openlm rapidly (e.g. #74), so we may want to put off integrating into HF to reduce maintenance effort. @sedrick-keh-tri will look into this and add it if it's easy, otherwise we'll punt until later this year.

sedrick-keh-tri commented 8 months ago

Implemented here: https://github.com/sedrick-keh-tri/transformers

Steps:

  1. Install HF transformers from the repo above instead of the usual HF transformers.
  2. You can now use AutoModelForCausalLM to load the model. (Note: Requires CUDA. Does not work for CPU.)
    from transformers import AutoTokenizer, AutoModelForCausalLM
    tokenizer = AutoTokenizer.from_pretrained("TRI-ML/openlm-1b")
    model = AutoModelForCausalLM.from_pretrained("TRI-ML/openlm-1b").to("cuda")
    a = tokenizer("hi", return_tensors="pt")
    out = model.generate(a['input_ids'].to("cuda"), max_length=60, do_sample=False)
    print(tokenizer.decode(out[0]))
  3. Another example: 7B code model
    from transformers import AutoTokenizer, AutoModelForCausalLM
    tokenizer = AutoTokenizer.from_pretrained("TRI-ML/openlm-7b-code")
    model = AutoModelForCausalLM.from_pretrained("TRI-ML/openlm-7b-code").to("cuda")
    a = tokenizer("def find_most_common(arr)", return_tensors="pt")
    out = model.generate(a['input_ids'].to("cuda"), max_length=60, do_sample=False)
    print(tokenizer.decode(out[0]))

Note: This is an unofficial implementation, so we aren't merging it with the HF transformers repo right now. If OpenLM wants to eventually release models, I would be in favor of integrating with HF then.

sedrick-keh-tri commented 8 months ago

Note to self (and to future OpenLM folks who want to work on this):

Testing:

Some other things we want to consider/fix in for future release: