Closed taghreed34 closed 2 years ago
Hi @taghreed34, We used luke-large in our experiments, so the training time corresponds to the large model.
It depends on the computational environment, but the mixed precision training based on the APEX library (O2
mode) generally reduces the training time significantly.
As reported in the paper (the training duration of CoNLL-2003 NER was 203 minutes using a single V100 GPU (Table 11)). Was this training time for luke_large or luke_base?
And to what extent does using mixed precision training based on APEX library affect training time?