kingoflolz / mesh-transformer-jax

Model parallel transformers in JAX and Haiku
Apache License 2.0
6.29k stars 892 forks source link

Did you use the splits made by the Pile directly? #207

Closed boyang9602 closed 2 years ago

boyang9602 commented 2 years ago

Hi,

I just want to confirm if you use the splits of the dataset provided by the Pile in training? I mean in the Pile dataset,

The Pile is provided as train, validation, and testing splits. The validation and testing components each contain 0.1% of the data, sampled uniformly at random.

Did you use the train split in the Pile to train the model directly? Did you mix and split the dataset by yourself?

Thank you very much! Best regards, Bo Yang