microsoft / CodeBERT

CodeBERT
MIT License
2.15k stars 442 forks source link

total pretrain time cost #227

Open xieexiaotuzi opened 1 year ago

xieexiaotuzi commented 1 year ago

Thanks for sharing such a great work. I would like to ask two questions:

1) Hong long for the total pretraining?

I saw at the Section B.1.

###################### "We set the max length as 512 and the max training step is 100K. Training 1,000 batches of data costs 600 minutes with MLM objective, 120 minutes with RTD objective". ######################

So is the total cost 50 days? and MLM and RTD are not trained together? (For example, MLM trained for 3 epochs and then RTD)

2) The max length for discriminator is 512, and what is the max length for the two generators? 512/2 = 256?

Are the outputs of nl generator and code generator sent to the discriminator together after they are sampled?

Thanks for your help.

guoday commented 1 year ago
  1. Please refer to UniXcoder paper. After fixing bug, the total cost is about 3 days. No, MLM and RTD are not trained together.
  2. For generators, we use n-gram language models, only using 3-gram.
xieexiaotuzi commented 1 year ago

Thanks for your feedback.

I would like to mask sure that the RTD are trained afer MLM finished? Or sth like: MLM trained for 3 epoches and then RTD for 1 epoch?

Best

guoday commented 1 year ago

afer MLM finished

xieexiaotuzi commented 1 year ago

Got it. Thanks.