thtrieu / thtrieu.github.io

https://thtrieu.github.io
MIT License
9 stars 4 forks source link

Files for the paper A Simple Method for Commonsense Reasoning #20

Open xiaoouwang opened 3 years ago

xiaoouwang commented 3 years ago

Hello Trieu!

Sorry to disturb you here I was desperately trying to find a way to reach you :D

I'm a student in nlp currently working on Winograd and I'm trying to reproduce the results in your paper. I successfully found the code here https://github.com/tensorflow/models/tree/archive/research/lm_commonsense however all the files on google cloud are no longer available. I'd like to know if you have a backup by miracle...

Sincerely, Xiaoou

thtrieu commented 3 years ago

Hi Xiaoou WANG,

I am sorry for the inconvenience, we lost the data a while ago but managed to restored a part of it (the stories corpus + train/dev/test) here:

https://drive.google.com/drive/u/1/folders/1yZzwaV8LO1hK8ChIm0sxazXF8BSIZ683

On Wed, Feb 10, 2021 at 1:22 PM Xiaoou WANG notifications@github.com wrote:

Hello Trieu!

Sorry to disturb you here I was desperately trying to find a way to reach you :D

I'm a student in nlp currently working on Winograd and I'm trying to reproduce the results in your paper. I successfully found the code here https://github.com/tensorflow/models/tree/archive/research/lm_commonsenese however all the files on google cloud are no longer available. I'd like to know if you have a backup by miracle...

Sincerely, Xiaoou

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/thtrieu/thtrieu.github.io/issues/20, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACGEWB2GYGEUGPQPM7QI4B3S6IQZBANCNFSM4XMLKDOQ .

xiaoouwang commented 3 years ago

Tks! It's really a pity...

jaewonalive commented 2 years ago

Hi Trieu,

I'm currently researching a large language model.

I found that Megatron-LM used CC-stories dataset to pretrain their model.

I'm trying to reproduce their pretraining result.

However, currently I cannot find CC-stories dataset in the drive you used in the past.

Could I get CC-stories dataset?

I found the github readme.

But this link does not contain the data anymore.

It would be really helpful to research my work if I could get CC-stories dataset.

Thank you.

Best, Jaewon.