Closed ayaka14732 closed 1 year ago
論文入面好似冇提到預訓練資料集係乜嘢
The corpora of the minority languages are in-house data, consisting of short monolingual sentences. The total corpora size is 28 GB. The statistics of the pre-training corpora are listed in Appendix A.
論文入面好似冇提到預訓練資料集係乜嘢