kingoflolz / mesh-transformer-jax

Model parallel transformers in JAX and Haiku
Apache License 2.0
6.29k stars 892 forks source link

Instruct GPT fine tuning #208

Closed Jbollenbacher closed 2 years ago

Jbollenbacher commented 2 years ago

Hello!

Recently GPT-3 has been updated to the new InstructGPT version, which was produced by further fine-tuning the original GPT-3 models to make them better at following short instructions for zero-shot tasks such as question answering and classification. The InstructGPT models perform significantly better than the old models. Even the smaller 1.7B and 6.7B parameter InstructGPT models outperform the original 175B parameter model in some ways.

Are there any plans to produce an InstructGPT-like update to GPT-J?

Thanks

kingoflolz commented 2 years ago

I have no plans to produce more pretrained/finetuned models using this codebase, but the weights are available for others to fine tune.