Hey @TimLChan, this looks good to me. Thanks so much for contributing it. A couple of very small requests before I merge:
could you include a couple of sentences in the README about 1) what the twitter archive is and how you get yours and 2) how to run this script to generate your own test corpus?
Fixes #48 Adding in basic functionality to take a tweet archive (csv) and convert it into a corpus