kingoflolz / mesh-transformer-jax

Model parallel transformers in JAX and Haiku
Apache License 2.0
6.26k stars 890 forks source link

CausalTransformerV2 or CausalTransformer? #220

Open leejason opened 2 years ago

leejason commented 2 years ago

Is the pretraining of GPT-J-6B based on CausalTransformerV2 or simply CausalTransformer? Why?

Thanks for any advice.