Closed hitchhicker closed 2 years ago
Hi~ Thanks for your attention.
Throughout all the experiments, we use NVIDIA A100 40GB GPUs, which are different from V100 32GB GPUs used in RoBERTa-Large. Note that the purpose of Table 1 is to intuitively demonstrate the difference between TLM and PLM, so the comparison is qualitative and not strictly fair. We also provide some quantitative results about computational cost in terms of FLOPs in Table 2 for your reference.
Thanks for your answer, it is very clear for me.
Hello,
Great work ! I am quite interested in your work.
I would like to know what kind of GPU have you used for training TLM ? From the Table 1, I see that it were 8 GPU with 42 hours. Are they 8 Nvidai V100 GPUs with 32 GB or something else ?
Looking forwar to your answer.
Thanks in advance.