On the use of cooccurgen.py

Dear William, I’d like to use your code to induce a sentiment lexicon from a new corpus. In your answer to the issue #8, you wrote that the first step is to “Use representations/cooccurgen.py to process a corpus and construct co-occurrence matrices.” By looking at cooccurgen.py, it seems that it takes in input a corpus in the COHA word_lemma_pos format and it also needs a file called index.pkl.

Do I have to transform my corpus into a tabular format like the COHA format?
How is the index.pkl file created?
Is there any way to use the script starting from a raw corpus?

Thanks a lot in advance! Best, Rachele

williamleif / socialsent

On the use of cooccurgen.py #21