The section "Data/English Coreference:" says, "The text was extracted and cleaned, to have one Wikipedia paragraph per line, then downsampled and tokenised using the NLTK tokeniser, ..." Could you tell me how to downsample the data? I cannot find any relevant file in data_cleaning/wiki_data_cleaning.
The section "Data/English Coreference:" says, "The text was extracted and cleaned, to have one Wikipedia paragraph per line, then downsampled and tokenised using the NLTK tokeniser, ..." Could you tell me how to downsample the data? I cannot find any relevant file in
data_cleaning/wiki_data_cleaning
.