Closed sfriedowitz closed 6 months ago
Looking now! Pulled the branch, reading the instructions for direct_job_execution.ipynb
and have put together a a script like this. Where do we specify the cluster information in the new workflow? Seems like it should be somewhere here right? FinetuningRayConfig
from ray.job_submission import JobSubmissionClient
from pathlib import Path
from lm_buddy import LMBuddy
from lm_buddy.jobs.configs import (
FinetuningJobConfig,
FinetuningRayConfig,
LMHarnessJobConfig,
LMHarnessEvaluationConfig,
)
from lm_buddy.integrations.huggingface import (
AutoModelConfig,
TextDatasetConfig,
TrainerConfig,
AdapterConfig,
)
from lm_buddy.integrations.wandb import WandbRunConfig
# Base model to finetune from HuggingFace
model_config = AutoModelConfig(load_from="distilgpt2")
# Text dataset for finetuning
dataset_config = TextDatasetConfig(
load_from="imdb",
split="train[:100]",
text_field="text",
)
# HuggingFace trainer arguments
trainer_config = TrainerConfig(
max_seq_length=256,
per_device_train_batch_size=8,
learning_rate=1e-4,
num_train_epochs=1,
logging_strategy="steps",
logging_steps=1,
save_strategy="epoch",
save_steps=1,
)
# LORA adapter settings
adapter_config = AdapterConfig(
peft_type="LORA",
task_type="CAUSAL_LM",
r=8,
lora_alpha=16,
lora_dropout=0.2,
)
# Define tracking for finetuning run
tracking_config = WandbRunConfig(
name="example-finetuning",
project="lm-buddy-examples", # Update to your project name
entity="mozilla-ai", # Update to your entity name
)
# Ray train settings
ray_config = FinetuningRayConfig(
use_gpu=False, # Change to True if GPUs are available on your machine
num_workers=2,
)
# Full finetuning config
finetuning_config = FinetuningJobConfig(
model=model_config,
dataset=dataset_config,
trainer=trainer_config,
adapter=adapter_config,
tracking=tracking_config,
ray=ray_config,
)
Where do we specify the cluster information in the new workflow?
Nothing is changing in how you specify the cluster information. The CLI of the package is not changed, so you can use the same commands as an entrypoint to a Ray job submission using their SDK.
Tests and left some comments, unit tests pass and sample job works!
Thanks! Im a bit side tracked atm but will address most all of them in the next few hours.
What's changing
run_job
method in favor of a classLMBuddy
that has methods forfinetune
andevaluate
LoadableAssetPath
type and associated data structures to represent anyload_from
path for a HF asset. See inline comments for motivation for this change.Note that the CLI API is not changed by these internal changes, so you can still execute the package as a Ray entrypoint in the same manner as before.
How to test it
Related Jira Ticket
Additional notes for reviewers
In follow-up PRs into this dev branch, I would like to do the following: