microsoft / LightLDA

Scalable, fast, and lightweight system for large-scale topic modeling
http://www.dmtk.io
MIT License
842 stars 235 forks source link

data prepare #81

Open vision-zhao opened 5 years ago

vision-zhao commented 5 years ago

there is transcript.txt file in my hand, and i want use it to train a lda model, which is fed by two files like 'docword.nytimes.txt' and 'vocab.nytimes.txt'. so i wonder how to handle this transcript to be the exactly format?(i have wrote a script to handle my file, but it didn't work) i will be very grateful if you can tell me the exactly format I need when you are convenient!