Analysis of available Tweets from the 21N protests in Colombia (Work In Progress!). The available data covers all tweets sent from the before the beginning of the protests on the 21st of November, to a couple of days after. The actual demonstrations lasted for weeks, but we focused on the data from the beginning of the event.
Note: The colors in the graph are the result of a topic detection algorithms.
We used the rtweet
R package to fetch a sample of tweets during the protests period. We tried to cover a large topic spectrum via the following keywords:
"21N",
"#21N",
"#21NSomosTodos",
"#Paro21N",
"#YoMarchoEste21",
"#YoMarchoEl21",
"#YoNoMarchoEste21",
"#YoNoMarchoEl21",
"#RazonesParaMarchar",
"#RazonesParaNoMarchar",
"#100RazonesParaMarchar",
"#100RazonesParaNoMarchar",
"#YoNoParo",
"#YoParoEl21NSinMiedo",
"#Cacerolazo",
"#22N",
"#23N",
"#ToqueDeQueda"
Here is the script we used to fetch the data. The sample data is stored as an .rds
file and contains 1,006,484 tweets.
In this repository you can find some data pre-processing and initial exploratory data analysis: word counts, topic modeling and network (retweet) analysis.
We have not had the capacity to dig deeper into the analysis, but we believe there are interesting insights to be extracted from this data. If you would like to have access to the data and/or contribute to this repository please do not hesitate to contact us. You could also create an issue to suggest ideas or directions for future development.
Rscript R/data_processing.R