[BUG] rope scaling with phi3 models

arunpatala commented 5 months ago

Hi,

I am trying to train phi3 mini model with longer context length 8192 than its default length of 4096. I understand that reope scaling is not supported for models with sliding window. How can I proceed from this to train a phi3 model with longer context? should i finetune the base model to extend its context length? which methods can i use? Is there a plan to support in future?

AlgorithmError: ExecuteUserScriptError: ExitCode 1 ErrorMessage "raise RuntimeError( RuntimeError: Unsloth: Unfortunately Mistral type models do not support RoPE scaling! The maximum sequence length supported is 4096." Command "/opt/conda/bin/python3.10 run_unsloth.py --bf16 True --dataset_path /opt/ml/input/data/training --eval_steps 1000 --evaluation_strategy steps --fp16 False --gradient_accumulation_steps 2 --gradient_checkpointing True --learning_rate 0.0002 --load_in_4bit True --logging_dir /opt/ml/output/tensorboard --logging_steps 10 --lr_scheduler_type linear --max_seq_length 8192 --model_name unsloth/Phi-3-mini-4k-instruct-bnb-4bit --neftune_noise_alpha 5 --num_train_epochs 2 --optim adamw_8bit --output_dir /opt/ml/checkpoints --per_device_eval_batch_size 6 --per_device_train_batch_size 6 --report_to tensorboard --save_strategy epoch --seed 3407 --train_filename train.parquet --validation_filename val.parquet --warmup_steps 5 --weight_decay 0.01", exit code: 1

danielhanchen commented 5 months ago

@arunpatala Oh yep that is an issue - planning to support the 128K Phi later down the road

arunpatala commented 5 months ago

thanks

unslothai / unsloth

[BUG] rope scaling with phi3 models #406