Open leejason opened 2 years ago
Is the pretraining of GPT-J-6B based on CausalTransformerV2 or simply CausalTransformer? Why?
Thanks for any advice.
Is the pretraining of GPT-J-6B based on CausalTransformerV2 or simply CausalTransformer? Why?
Thanks for any advice.