kingoflolz / mesh-transformer-jax

Model parallel transformers in JAX and Haiku
Apache License 2.0
6.29k stars 892 forks source link

add Gopher 230B results #168

Closed djoldman closed 2 years ago

djoldman commented 2 years ago

Paper reports: "For all MassiveText subsets, we filter out non-English documents, process data into a homogeneous text-only format, deduplicate documents, and filter out documents too similar to those in our test sets." Therefore it seems safe to assume no test-set contamination.

https://storage.googleapis.com/deepmind-media/research/language-research/Training%20Gopher.pdf

kingoflolz commented 2 years ago

Thanks