Closed colinallen closed 6 years ago
Models trained on Xiaohong's papers picking up only English words
In [3]:
# print the most frequent terms in the document tf_v.coll_freqs() Out[3]: Collection Frequencies Word Counts Word Counts science 70 simon 24 discovery 43 information 21 studies 33 computer 20 social 33 langley 20 vol 29 bacon 19 press 29 oxford 19 scientific 28 artificial 18 falun 27 for 18 philosophy 26 cambridge 18 university 25 ibid 18
Fixed. bad tokenization.
Models trained on Xiaohong's papers picking up only English words
In [3]: