Closed peng06051126 closed 1 year ago
The Galactica models were pretrained on large amount of papers (see our paper for more details):
so you should be able to generate articles out-of-the-box, but it depends on your use case.
Thank you for your reply. May I ask how the model performs on non-English data? Has there been any relevant test? And what proportion does non-English data take in the pre-training data set, such as Chinese data, etc.
By design the models are not multi-lingual and most of the natural language documents in the pretraining corpus are written in English. See more in Introduction to GALACTICA Models notebook (look for "multi-lingual").
First of all, thank you for your great contribution. I would like to fine-tune galactica in the direction of generating articles from topics. Can you provide training data samples, or do you have any suggestions?