ryanamannion / thehoyabot

Twitter bot that generates and tweets novel article titles based on a sample text
Other
0 stars 1 forks source link

TheHoyaBot

This Twitter Bot tweets out article titles generated by an n-gram generator trained on titles from The Hoya, Georgetown's student newspaper. The titles were scraped from The Hoya's online archive, which goes back to 1998. The n-gram generator was adapted from code provided in an intro NLP course by Amir Zeldes. The Twitter Bot uses Molly White's twitter bot framework provided here

For a more detail on how and why I built this, check out the write-up on my website

Files

The Following is a short description of each of the included files

File Description
sitescraper.py Script to scrape thehoya.com titles from html using BeautifulSoup
hoyatitles.txt Scraped titles from the online archives of The Hoya 1998-2018
getcounts.py Script to count trigram continuations from hoyatitles.txt
counts.pkl The counts of the trigram continuations for use in generate.py, pickled format for efficiency
generate.py Script to generate new titles from pickled file
bot.py Script to tweet out the titles generated from generate.py