nyu-mll / GLUE-baselines

[DEPRECATED] Repo for exploring multi-task learning approaches to learning sentence representations
https://gluebenchmark.com
760 stars 166 forks source link

Where to download some pretraining data? #17

Closed guotong1988 closed 4 years ago

guotong1988 commented 4 years ago

Thank you very much.

W4ngatang commented 4 years ago

Hi Tong,

Many people use Wikipedia as unlabeled pretraining data; a popular version is WikiText103 from Salesforce. Alternatively, you can pretrain on large supervised datasets like MultiNLI and SocialIQA for smaller datasets. Also, we've largely migrated to jiant (https://github.com/nyu-mll/jiant) and you can find more support there.

Best, Alex

On Wed, May 13, 2020, 04:15 Tong Guo notifications@github.com wrote:

Thank you very much.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/nyu-mll/GLUE-baselines/issues/17, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKDWG7PZP76DNCRKESWF2TRRJJK5ANCNFSM4M7QXSBQ .