Open nichellehouston opened 4 days ago
@nichellehouston To get the highest quality it typically depends on the model and the quality of the datasets. but the main hyperparameters should be
# Model Hyperparameters
learning_rate = 2e-4 # Starting with a lower learning rate for stability
lora_alpha = 32 # Higher value to capture more complexity
num_train_epochs = 3 # Training for more epochs to capture better patterns
r = 16 # Small r: Faster training, less adaptation capacity.
# Large r: Slower training, more powerful adaptation (with more parameters).
train for a few steps and then stop save the loss for comparison next , change the values for the hyperparameters train again and compare the loss
I feel like you can increase the amount of per_device_train_batch_size
to maximize your VRAM usage since your GPU is 48GB .-.
For train model on 48G GPU to translate language how change setting to get highest quality
r = 16 lora_alpha = 16, lora_dropout = 0, random_state = 3407, per_device_train_batch_size = 2, gradient_accumulation_steps = 4, warmup_steps = 5, num_train_epochs = 1, learning_rate = 2e-4, seed = 3407,