data prepare - Githubissues

there is transcript.txt file in my hand, and i want use it to train a lda model, which is fed by two files like 'docword.nytimes.txt' and 'vocab.nytimes.txt'. so i wonder how to handle this transcript to be the exactly format?(i have wrote a script to handle my file, but it didn't work) i will be very grateful if you can tell me the exactly format I need when you are convenient!