kingoflolz / mesh-transformer-jax

Model parallel transformers in JAX and Haiku
Apache License 2.0
6.29k stars 892 forks source link

Difference between the inputs to GPT-J6B and GPT-2? #127

Closed BakingBrains closed 3 years ago

BakingBrains commented 3 years ago

Can you please give some information regarding what is the difference between feeding the data to GPT-J and GPT-2, let's say for example python code generation. Is the data feeding method is same for both models? Because, in GPT-2 to generate python code we need to give a piece of python code where as in GPT-J6B we can only give the prompts like 'write a program to add two numbers'

kingoflolz commented 3 years ago

There is significant differences in the data distribution between the two models, and this means prompts don't necessarily transfer.