Closed chengyang00 closed 3 years ago
My understanding. Stage 1 is synthetic data which is also huge in size, so training is done on that. Stage 2 and 3 use manually annotated and accurate data with the kind of errors humanly made. The data size is tiny compared to the synthetic data. Thus they call it fine tuning and not training.
Thanks for answering this. You're right – different stages have data of different quality.
I want to know why doing this can improve the performance. Thanks!