Dear authors, I want to pretrain my own word pair representations on a wiki corpus. By the instruction I need to run python -m embedding.preprocess to preprocess my corpus. I notice that preprocess.py requires an argument called pair_count_file, however, I cannot find any code on counting the word pair statistics. Would you tell me how to create this file ?
Dear authors, I want to pretrain my own word pair representations on a wiki corpus. By the instruction I need to run
python -m embedding.preprocess
to preprocess my corpus. I notice thatpreprocess.py
requires an argument calledpair_count_file
, however, I cannot find any code on counting the word pair statistics. Would you tell me how to create this file ?