This Twitter Bot tweets out article titles generated by an n-gram generator trained on titles from The Hoya, Georgetown's student newspaper. The titles were scraped from The Hoya's online archive, which goes back to 1998. The n-gram generator was adapted from code provided in an intro NLP course by Amir Zeldes. The Twitter Bot uses Molly White's twitter bot framework provided here
For a more detail on how and why I built this, check out the write-up on my website
The Following is a short description of each of the included files
File | Description |
---|---|
sitescraper.py | Script to scrape thehoya.com titles from html using BeautifulSoup |
hoyatitles.txt | Scraped titles from the online archives of The Hoya 1998-2018 |
getcounts.py | Script to count trigram continuations from hoyatitles.txt |
counts.pkl | The counts of the trigram continuations for use in generate.py, pickled format for efficiency |
generate.py | Script to generate new titles from pickled file |
bot.py | Script to tweet out the titles generated from generate.py |