ottofabian / NLP4Web_Project

NLP4Web Projekt Repository WS 17/18 TU Darmstadt
0 stars 3 forks source link

Find possible Twitter accounts and parameters for data #5

Open ottofabian opened 6 years ago

ottofabian commented 6 years ago

This issue is for collecting possible blogs in order to get enough data for training, testing.

mrnyc54 commented 6 years ago

http://slatestarcodex.com/

mrnyc54 commented 6 years ago

Alison Kent blog by Alison Kent Between the Lines by David Allen Beyond the Beyond by Bruce Sterling Blatherings by Debbie Ridpath Ohi Buzz, Balls and Hype by M.J. Rose Cabbages and Kings by PJ Parrish Carolyn's Blog by Carolyn Jewel C.J. Barry's Blog by C.J. Barry Contrary Brin by David Brin Craphound by Cory Doctorow Dave Barry's Blog by Dave Barry Diary of a Mad Romance Author by Kathleen O'Reilly The Dilbert Blog by Scott Adams Dispatches from Tanganyika by Poppy Z. Brite Freakonomics by Steven D. Levitt and Stephen J. Dubner Greek Tragedy by Stephanie Klein GreggHurwitzWeblog by Gregg Hurwitz Gus Openshaw's Whale-Killing Journal by Keith Thomson J-Walk Blog by John Walkenbach Krentz Quick & Castle Blog by Jayne Ann Krentz Lessig Blog by Lawrence Lessig LKH Blog by Laurell K. Hamilton Meg's Diary by Meg Cabot Neil Gaiman's Journal by Neil Gaiman No rules. Just write by Brenda Coulter Novelesque: House of Sand and Blog by Douglas Clegg Paperback Writer by Lynn Viehl Patti O'Shea's blog by Patti O'Shea PeterDavid.net by Peter David Pocket Full of Words by Holly Lisle Pop Culture Magazine by Bill Crider Searchblog by John Battelle Seth's Blog by Seth Godin Slay Your Demons by Julie Kenner SnarkSpot by Jennifer Weiner Tess Gerritsen Blog by Tess Gerritsen Web Petals by Marjorie M. Liu Wil Wheaton Dot Net by Wil Wheaton A Writer's Life by Lee Goldberg Writing Fiction by Crawford Kilian

mrnyc54 commented 6 years ago

from https://www.writerswrite.com/authorblogs/

mrnyc54 commented 6 years ago

Changed Topic to fit new Twitter agenda. To be identified:

  1. What Domains are we starting with?
  2. Which Twitteraccounts do we assign to this domain?
  3. How many Twitteraccounts/authors do we need to crawl?
  4. How many tweets per author do we crawl?
  5. Do we need to make additional criteria for tweets (e.g. must have Hashtags, no less than 100 charachters, ...)?
mrnyc54 commented 6 years ago

Started to work on Twitter crawler with following assumptions:

  1. Start with politics domain
  2. Start with: (8 in total, from http://www.businessinsider.com/15-politicians-on-twitter-you-must-follow-2013-7?IR=T#rep-jim-himes-d-conn-6)
    • Trump,
    • Obama,
    • Sen. Chuck Grassley (R-Iowa),
    • Rep. Jared Polis (D-Colo.),
    • London Mayor Boris Johnson,
    • Sen. Claire McCaskill (D-Mo.),
    • Gov. Chris Christie (R-N.J.),
    • Rep. Jim Himes (D-Conn.)
  3. Aiming for 15. First milestone is above 8
  4. Crawling 1.000 Tweets per author (latest)
  5. No additional criteria so far
mrnyc54 commented 6 years ago

Added: "theresa_may", "jeremycorbyn", "David_Cameron", "BernieSanders", "RonPaul", "SpeakerRyan", "mike_pence"

Now at 15 authors in total.

mrnyc54 commented 6 years ago

Added few more. Now 20 in total (various politicians from UK &USA):

"realDonaldTrump", "BarackObama", "ChuckGrassley", "RepJaredPolis", "BorisJohnson", "clairecmc", "ChrisChristie", "jahimes", "jeremycorbyn", "CarolineLucas", "David_Cameron", "BernieSanders", "RonPaul", "SpeakerRyan", "mike_pence", "DavidLammy", "timfarron", "Ed_Miliband", "ChukaUmunna", "tom_watson"

Rejected: theresa_may (<600 Tweets)

Suggestions for more domains or more authors are welcome!