Closed liangnn17 closed 2 years ago
no, you don't need to tokenize it yourself. You can use the script they provided for preprocessing in order to get the data ready in a compatible format for gector.
Hi @liangnn17 The tokenization for PIE indeed may be a bit different from the one used in BEA data, but I think it wouldn't influence the quality significantly.
Hi,
I noticed that the tokenization method in PIE data is different from the nucle and fce data you used. I'm wondering whether I need to detokenize the PIE data and use spacy to do tokenization on my own.
Looking forward to your advice!