jiaweizzhao / GaLore

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Apache License 2.0
1.24k stars 131 forks source link

Galore finetuning #stopped #51

Open j-datta opened 3 weeks ago

j-datta commented 3 weeks ago
# Configuration parameters
model_name_or_path = "mistralai/Mistral-7B-v0.1"
max_length = 128
doc_stride = 128
pad_to_max_length = True
per_device_train_batch_size = 1
per_device_eval_batch_size = 1
learning_rate = 0.0002
weight_decay = 0.0
num_train_epochs = 1
gradient_accumulation_steps = 1
output_dir = "/home/IAIS/jdatta/teacher_model"
seed = 42

# Load the datasets
squad = datasets.load_dataset("rajpurkar/squad_v2")
dataset = squad['train'].train_test_split(test_size=0.2)
train_dataset = dataset['train']
eval_dataset = dataset['test']

train_dataset = train_dataset.select(range(1000))
eval_dataset = eval_dataset.select(range(500))

training_args = TrainingArguments(
    output_dir=output_dir,
    evaluation_strategy="steps",
    warmup_ratio=0.05,
    overwrite_output_dir=True,
    gradient_accumulation_steps=gradient_accumulation_steps,
    per_device_train_batch_size=per_device_train_batch_size,
    per_device_eval_batch_size=per_device_eval_batch_size,
    num_train_epochs=num_train_epochs,
    fp16=True,
    eval_steps=10,
    save_strategy='steps',
    save_steps=10,
    save_total_limit=1,
    dataloader_num_workers=2,
    load_best_model_at_end=True,
    report_to="none",
    prediction_loss_only=True,
    gradient_checkpointing=True,
    optim_args="rank=64, update_proj_gap=100, scale=0.10",
    optim="galore_adafactor",
    optim_target_modules=["c_attn", "c_proj", "q_proj", "k_proj", "v_proj", "down_proj", "up_proj"],
    learning_rate=learning_rate,
    weight_decay=weight_decay,
)

os.environ["CUDA_LAUNCH_BLOCKING"] = "1"

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    data_collator=data_collator,
)
trainer.train()

The traning is not starting. It is showing the following comments for 2 hours: /home/IAIS/jdatta/miniconda3/envs/myenv/lib/python3.11/site-packages/transformers/training_args.py:1474: FutureWarning: evaluation_strategy is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use eval_strategy instead warnings.warn( Activated GaLoRE fine-tuning, depending on your model size and hardware, the training might take a while before starting. Please be patient ! huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:

Should I tune any parameter? I've tried with Mistral-7b, Phi-2, Llama-7b also.