model = MosaicGPT.from_pretrained(
"mosaicml/mpt-1b-redpajama-200b",
trust_remote_code=True,
attn_impl='torch'
)
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=tokenized_train_data["train"],
eval_dataset=tokenized_val_data["validation"],
dataset_text_field="text",
args=training_args,
neftune_noise_alpha=5 #the only one important thing for me
)
Yet it fails with various missing features in MPT-1b implementation:
Please help the community to use MPT-1b by:
a) retraining MPT-7b with 1b params size weights and MPT-7b code base
b) by updating MPT-1b codebase (which diverges from MPT-7b in terms of architecture a bit)
What I want to do:
Yet it fails with various missing features in MPT-1b implementation:
and potentially others.
Please help the community to use MPT-1b by: a) retraining MPT-7b with 1b params size weights and MPT-7b code base b) by updating MPT-1b codebase (which diverges from MPT-7b in terms of architecture a bit)