unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
17.76k stars 1.23k forks source link

Continued pretraining facing catastrophic forgetting #1123

Closed InderjeetVishnoi closed 3 weeks ago

InderjeetVishnoi commented 3 weeks ago

Hi @danielhanchen

I tried fine-tuning llama 3.2-1b base model for 2 of my tasks following below example notebook https://colab.research.google.com/drive/1tEd1FrOXWMnCU9UIvdYhs61tkxdMuKZu?usp=sharing#scrollTo=MKX_XKs_BNZR

Below is the observation: 1> I evaluated the model after fine-tuning for the first task -- model performed well. 2> I continued the training of the model/adapter for my task2 -- model performs well for task2 but apparently forgot learning of task 1. I am facing catastrophic forgetting in this case. I have kept the sample size of both the training almost identical.I have tried to play with the learning rate while training for the second [tried lower learning rate than the first one], still did not workout. Are there any guidelines/suggestions to make it work. My requirement is to progressively train the model on N tasks.

Sneakr commented 3 weeks ago

@InderjeetVishnoi This does not sound like an unsloth issue, but a general fine tuning/training guideline overall, and this is the wrong forum for it.

If you want suggestions or guidelines I would suggest you join the discord channel, many users are helping out there.

https://discord.gg/unsloth

InderjeetVishnoi commented 3 weeks ago

@Sneakr Point taken :)

Sneakr commented 3 weeks ago

@Sneakr Point taken :)

You can maybe experiment with Concatenate https://huggingface.co/docs/datasets/process your datasets together . Also try bigger ranks when doing so to give the model more space to learn more complicated tasks. Also, due to higher rank, you can try rslora=True (or false) to experiment differences. GL! :)