mandarjoshi90 / pair2vec

pair2vec: Compositional Word-Pair Embeddings for Cross-Sentence Inference
Apache License 2.0
61 stars 8 forks source link

How to create the pair count file ? #2

Closed Saltychtao closed 5 years ago

Saltychtao commented 5 years ago

Dear authors, I want to pretrain my own word pair representations on a wiki corpus. By the instruction I need to run python -m embedding.preprocess to preprocess my corpus. I notice that preprocess.py requires an argument called pair_count_file, however, I cannot find any code on counting the word pair statistics. Would you tell me how to create this file ?

mandarjoshi90 commented 5 years ago

Thanks! I just added embeddings.cooccurance which should generate the file for you..