ryanmcdermott / trump-speeches

:page_facing_up: 1mb Archive of Donald Trump Speeches
179 stars 99 forks source link

Split by Speech #3

Open thedansimonson opened 8 years ago

thedansimonson commented 8 years ago

Thanks for making this available!

While the text itself is nice to have, some more interesting tasks can be done if the data is split into separate speeches, in some form: e.g. looking at how his rhetoric evolved over time, generating narrative schemas from his text, etc.

It's hard to run NLP on it without adding artificial document splits, and a lot more can be done with the text with a few parses slapped on top.

ryanmcdermott commented 8 years ago

Addressing #2 here.

Thank you for pointing that out, I overlooked this foolishly when scraping. It won't be too hard to reassemble where they came from, and you're right, the more interesting things are when you have more metadata and documents. I was training an RNN on the data so I needed all in one place and didn't think to keep it separate.

thedansimonson commented 8 years ago

Thanks, looking forward to an update!

reuning commented 8 years ago

Just want to say that I also am very interested in where the speech data is coming from and would be happy to help expand this.

StoneCypher commented 8 years ago

addressed in https://github.com/ryanmcdermott/trump-speeches/pull/4