Continued pretraining facing catastrophic forgetting

InderjeetVishnoi commented 3 weeks ago

Hi @danielhanchen

I tried fine-tuning llama 3.2-1b base model for 2 of my tasks following below example notebook https://colab.research.google.com/drive/1tEd1FrOXWMnCU9UIvdYhs61tkxdMuKZu?usp=sharing#scrollTo=MKX_XKs_BNZR

Below is the observation: 1> I evaluated the model after fine-tuning for the first task -- model performed well. 2> I continued the training of the model/adapter for my task2 -- model performs well for task2 but apparently forgot learning of task 1. I am facing catastrophic forgetting in this case. I have kept the sample size of both the training almost identical.I have tried to play with the learning rate while training for the second [tried lower learning rate than the first one], still did not workout. Are there any guidelines/suggestions to make it work. My requirement is to progressively train the model on N tasks.

Sneakr commented 3 weeks ago

@InderjeetVishnoi This does not sound like an unsloth issue, but a general fine tuning/training guideline overall, and this is the wrong forum for it.

If you want suggestions or guidelines I would suggest you join the discord channel, many users are helping out there.

https://discord.gg/unsloth

InderjeetVishnoi commented 3 weeks ago

@Sneakr Point taken :)

Sneakr commented 3 weeks ago

@Sneakr Point taken :)

You can maybe experiment with Concatenate https://huggingface.co/docs/datasets/process your datasets together . Also try bigger ranks when doing so to give the model more space to learn more complicated tasks. Also, due to higher rank, you can try rslora=True (or false) to experiment differences. GL! :)

unslothai / unsloth

Continued pretraining facing catastrophic forgetting #1123