Thanks for sharing such a great work. I would like to ask two questions:
1) Hong long for the total pretraining?
I saw at the Section B.1.
######################
"We set the max length as 512 and the max training step is 100K. Training 1,000 batches of data costs 600 minutes with MLM
objective, 120 minutes with RTD objective".
######################
So is the total cost 50 days? and MLM and RTD are not trained together? (For example, MLM trained for 3 epochs and then RTD)
2) The max length for discriminator is 512, and what is the max length for the two generators? 512/2 = 256?
Are the outputs of nl generator and code generator sent to the discriminator together after they are sampled?
Thanks for sharing such a great work. I would like to ask two questions:
1) Hong long for the total pretraining?
I saw at the Section B.1.
###################### "We set the max length as 512 and the max training step is 100K. Training 1,000 batches of data costs 600 minutes with MLM objective, 120 minutes with RTD objective". ######################
So is the total cost 50 days? and MLM and RTD are not trained together? (For example, MLM trained for 3 epochs and then RTD)
2) The max length for discriminator is 512, and what is the max length for the two generators? 512/2 = 256?
Are the outputs of nl generator and code generator sent to the discriminator together after they are sampled?
Thanks for your help.