Open skye95git opened 1 year ago
In the paper, C4 dataset is used to pre-training UniXcoder. Which subset of C4 is used in paper? en?
en
How many pieces of data do you have in your training set?
from datasets import load_dataset
dataset = load_dataset("c4","en")["train"]
In the paper, C4 dataset is used to pre-training UniXcoder. Which subset of C4 is used in paper?
en
?How many pieces of data do you have in your training set?