okuvshynov / slowllama

Finetune llama2-70b and codellama on MacBook Air without quantization
MIT License
431 stars 33 forks source link

Fine-tune other models #8

Open Gincioks opened 8 months ago

Gincioks commented 8 months ago

Hello,

Can we apply this method to fine-tune models other than llamas and codellama, such as mistral 7b?

Many thanks in advance!

okuvshynov commented 8 months ago

That should be possible in principle, but some of the code might be model specific now. Could you point me to the

I could look into that.

okuvshynov commented 8 months ago

https://github.com/mistralai/mistral-src/blob/main/mistral/model.py this one?

Gincioks commented 8 months ago

I'm relatively new to AI development, but I've interested in a fine-tuned version of Mistral Orca. It's available here: Mistral 7B OpenOrca on Hugging Face. However, it seems like this model is in a Hugging Face format, which may not be directly compatible with the code, yes?

You can find the original weights for the Mistral 7B model here: Original Weights for Mistral 7B.

Gincioks commented 8 months ago

I tried to find a method for converting hf weights to pytorch, but nothing came up.

Gincioks commented 8 months ago

https://github.com/mistralai/mistral-src/blob/main/mistral/model.py this one?

Yes

okuvshynov commented 8 months ago

Looking at https://huggingface.co/mistralai/Mistral-7B-v0.1/blob/main/pytorch_model.bin.index.json it should be possible to modify the loading to make it work. Need some updates to the loader code though.

Gincioks commented 8 months ago

Do you have any suggestions for getting started? I want to put this into action, despite the fact that there will be a lot to learn :D

okuvshynov commented 8 months ago

@Gincioks - I'm not entirely sure about the best way, but probably here's how I'd do it:

  1. Download mistral model
  2. Download their reference implementation
  3. Try to load it and continue some prompt to check that it works (without slowllama, just their reference)
  4. If it works, we can try importing some of it to slowllama
  5. First step is loading - it will definitely require changes to loader, maybe to model as well. It should be ok to break things at this point - just make it work with new model, and we can decide on how to generalize this - what needs to be configurable, etc.
  6. Then we need to make sure forward pass works. Compare the output we get here with the one we get from reference implementation
  7. After that backwards pass should be straightforward. Thank you for looking into this!
Gincioks commented 8 months ago

Currently I facing with this error: File "slowllama/models_manager.py", line 76, in prepare_model prepare_mistal_model( File "slowllama/mistral/mistral_loader.py", line 114, in prepare_mistal_model apply_subset(submodule, weight_subset, ci, title) File "slowllama/mistral/mistral_loader.py", line 53, in apply_subset module.weight[idx_subset] = weight_subset ~~~~~~~~~~~~~^^^^^^^^^^^^

RuntimeError: The expanded size of the tensor (11008) must match the existing size (14336) at non-singleton dimension 0. Target sizes: [11008, 4096]. Tensor sizes: [14336, 4096] when trying to prepare model. Any Thoughts?

Update: I was able prepare model and launch inference thought your code. I needed change FeedForward class. But now I have problem that model gives random tokens. It can still be a probkem with orward pass

okuvshynov commented 8 months ago

could you share your code somewhere? Maybe a branch in your forked repo?

Gincioks commented 8 months ago

Yes, yes, I will share the code, I made too many changes so I will start new repo. Also I was able get generation working perfectly. Now will do the same with finetuning.

okuvshynov commented 8 months ago

yeah, i think doing that in the forked version might be good option. thank you for looking into this!

Gincioks commented 8 months ago

Hey, this is a new repository: https://github.com/Gincioks/PicoTuner. I intend to utilize this as a package in another project, so I created a small cli for easier use.