Please bring code features from MPT-7b back to MPT-1b for use of MPT-1b with SFTTrainer.

What I want to do:

model = MosaicGPT.from_pretrained(
    "mosaicml/mpt-1b-redpajama-200b",
    trust_remote_code=True,
    attn_impl='torch'
)

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=tokenized_train_data["train"],
    eval_dataset=tokenized_val_data["validation"],
    dataset_text_field="text",
    args=training_args,
    neftune_noise_alpha=5 #the only one important thing for me
)

Yet it fails with various missing features in MPT-1b implementation:

forward with labels (like this on in MPT-7b)
get_input_embeddings (like this on in MPT-7b)

and potentially others.

Please help the community to use MPT-1b by: a) retraining MPT-7b with 1b params size weights and MPT-7b code base b) by updating MPT-1b codebase (which diverges from MPT-7b in terms of architecture a bit)

mosaicml / examples

Please bring code features from MPT-7b back to MPT-1b for use of MPT-1b with SFTTrainer. #439