Open fedebarba opened 7 years ago
The script filtermerge.py
works in this way
lookFor
at line 7 looks for entries matching the label
from the semantics filepython filtermerge.py outputname.json
creates outputname.json )This table represent the number of feeds collected from the sources during the entire period of the experiment. Those are feeds published by the selected pages and impressed on our user's timelines.
From all those feeds we filtered the posts relating to Santiago Maldonado's murder. Here it is possible to see the number of records once filtered.
To analyse the semantic trend related to the issue we decided to select a group of terms used to tell the story during the period. Some of them are words used to describe the facts, in this way it is possible not just to evaluate in a quantitative way but also understand the semantic trend from a qualitative point of view. The differences in the explanation of the facts can in fact highlight both the way sources use terms and the way the Facebook algorithm filters the story. Names of politicians are used for the same reasons but also because, in a period of election, those can give us an idea of the impact of the story on the political debate.
Here we have the distribution of Maldonado's feeds per day
To have a deeper insight about the trend we analyse the distribution of the selected terms during the period of the experiment. From a quantitative aspect such as the 'number of records per day' it is possible to see both the increasing number of the records and the semantic changes related to the terms used to report the story
nice, well done @fedebarba, three things:
The distribuition of terms in the days allows us to define how the trend can change due to a specific event. In particular here we focus on the 17-10-17, the day when Maldonado's body was found. These are the most used terms to report the news during the days just before the discovery, when few sources were publishing stories about Maldonado.
To better understand here we have the distribution of feeds
And the distribution of sources
To see how the trend is influenced by the algorithm we need to understand also how Maldonado's feeds are impressed. So here is a table where is possible to how feeds are divided in user's timelines.
As we can see from the up here the distribution of feeds in the user's timelines results very homogeneous. Every post appears just once in each timeline and it does not seems to be a preferred posts, the only source with more impressions per timeline is 'Perfilcom'. This homogenuos trend is not present in the days after the 17-10. It seems that once the news start to be huge and the amount of posts grows, the bigger sources are preferred and so with a % of impression much bigger than small sources. From the table beside it is possible to see how, after the 17-10, some sources start to have a much bigger impact on the timelines than others. This sources are in particular: Todonoticias, Perfilcom, LaNacion, Pagina12OK.
As the two tables below show the impact of small sources on the timelines seems to be related to the total number of posts. If the story is not at the center of the public debate there is enough space on the timelines for all the sources, but once the news gather a huge impact on the public debate the challenge to appear on the timeline seems to be unfair. Big source often gain a huge presence on the users timelines
The difference starts to be even clearer if we focus on specific days. We choose the 12-10, the day before the 17-10 with more records, and the 18-10, the day the news about the discovery of the body was out.
The first step to get deeper into the data is to look at some specific keywords and understand their behaviour during the entire period of the experiment. Has that specific keyword recorded a constant ascending/descending/stable trend? Or has something influenced its trend making it rapidly ascending or discending? Looking for the reason of those changes can make us understand the behaviour of the algorithm, it shows where the main reason is, if it is inside the social network or outside, and in this case how the algorithm react to the inputs from outside. To get even deeper in the analysis it is also possible to study the terms used by sources to describe and tell a specific event. This kind of studies can show how a specific word, with a fixed meaning, can be preferred to others making us able to determine in which way the algorithm filters some terms deciding himself the impact of the event on the community. The aim right here is the visualization of a semantic trend over the period of the experiment. How can we represent the evolution of the use of a term in both qualitative and quantitative way? A sentiment analysis is also provided thought dandelion.eu