vikhyat / moondream

tiny vision language model
https://moondream.ai
Apache License 2.0
4.88k stars 433 forks source link

Use Phi implementation from transformers #16

Closed vikhyat closed 6 months ago

vikhyat commented 7 months ago

The text model weights provided are for the old version of Phi before it was integrated into the Huggingface transformers library. Need to write a script to convert the weight keys into the new format and switch to using the Huggingface implementation directly.

abitha-thankaraj commented 7 months ago

Loading the older version from the hugging face transformers commit seems to work to load the weights. Would this be sufficient or were you thinking of mapping the keys in the older model to the newer model format?

import torch
from transformers import AutoConfig, AutoModelForCausalLM

config = AutoConfig.from_pretrained('microsoft/phi-1_5', 
                                    revision='24f9ea14df973a49a0d87c16d04df88d90067468', 
                                    trust_remote_code=True)
model = AutoModelForCausalLM.from_config(config,
                                         code_revision='24f9ea14df973a49a0d87c16d04df88d90067468', 
                                         trust_remote_code=True)

model.load_state_dict(torch.load('text_model.pt'))

Could potentially modify the config file in the model cards and the text_encoder to remove the explicit state dict loading as well.

vikhyat commented 7 months ago

I still think there's value in mapping the keys, that version doesn't support generation options like k-beam search IIRC

vikhyat commented 6 months ago

Completed: https://github.com/vikhyat/moondream/commit/1061fbf9c7e301ca18b716651dc388e48c2390a8