Closed torshie closed 10 months ago
Hi @torshie , we used the wikipedia reference classifier from RedPajama-v1. To train such a classifier, you can use the code in the rp_v1 branch here -- in data_prep/cc/classifier/
you will find code to train the wikipedia references classifier.
I want to apply this pipeline to a new language, but I cannot find a wikipedia reference classifier model for the language.
How is the English wikipedia reference model trained ? Any docs/links/suggestions ?
Thanks.