Open jowagner opened 3 years ago
When combining NCI with common crawl, paracrawl, OSCAR and other noisy corpora, it may be beneficial to give more weight to clean corpora, e.g. by concatenating multiple copies.
Yes, sounds like a good idea. I can repeat the best performing gaBERT model with this (and the new segmentation).
When combining NCI with common crawl, paracrawl, OSCAR and other noisy corpora, it may be beneficial to give more weight to clean corpora, e.g. by concatenating multiple copies.