tracking-exposed / experiments-data

Dataset, documentation and publication from the experiments run by https://tracking.exposed
https://tracking.exposed
7 stars 4 forks source link

Topics flow #3

Open fedebarba opened 7 years ago

fedebarba commented 7 years ago

The first step to get deeper into the data is to look at some specific keywords and understand their behaviour during the entire period of the experiment. Has that specific keyword recorded a constant ascending/descending/stable trend? Or has something influenced its trend making it rapidly ascending or discending? Looking for the reason of those changes can make us understand the behaviour of the algorithm, it shows where the main reason is, if it is inside the social network or outside, and in this case how the algorithm react to the inputs from outside. To get even deeper in the analysis it is also possible to study the terms used by sources to describe and tell a specific event. This kind of studies can show how a specific word, with a fixed meaning, can be preferred to others making us able to determine in which way the algorithm filters some terms deciding himself the impact of the event on the community. The aim right here is the visualization of a semantic trend over the period of the experiment. How can we represent the evolution of the use of a term in both qualitative and quantitative way? A sentiment analysis is also provided thought dandelion.eu

vecna commented 7 years ago

The script filtermerge.py works in this way

fedebarba commented 7 years ago

This table represent the number of feeds collected from the sources during the entire period of the experiment. Those are feeds published by the selected pages and impressed on our user's timelines.

image

From all those feeds we filtered the posts relating to Santiago Maldonado's murder. Here it is possible to see the number of records once filtered.

image

To analyse the semantic trend related to the issue we decided to select a group of terms used to tell the story during the period. Some of them are words used to describe the facts, in this way it is possible not just to evaluate in a quantitative way but also understand the semantic trend from a qualitative point of view. The differences in the explanation of the facts can in fact highlight both the way sources use terms and the way the Facebook algorithm filters the story. Names of politicians are used for the same reasons but also because, in a period of election, those can give us an idea of the impact of the story on the political debate.

image

Here we have the distribution of Maldonado's feeds per day

image

fedebarba commented 7 years ago

To have a deeper insight about the trend we analyse the distribution of the selected terms during the period of the experiment. From a quantitative aspect such as the 'number of records per day' it is possible to see both the increasing number of the records and the semantic changes related to the terms used to report the story

image

image image image image image image image image image image image image

vecna commented 7 years ago

nice, well done @fedebarba, three things:

  1. it is probably better if you remove the "Number of Records" gradient on the right, and put with tableau the actual number written with a small font into the boxes?
  2. Why the tables are not all covering the 16 days?
  3. Probably we should try to observe if any pattern raises considering only the top 10-15 most productive sources? because I don't get any story from that. (maybe there is not a story at all, but...)
fedebarba commented 7 years ago

The distribuition of terms in the days allows us to define how the trend can change due to a specific event. In particular here we focus on the 17-10-17, the day when Maldonado's body was found. These are the most used terms to report the news during the days just before the discovery, when few sources were publishing stories about Maldonado.

image

To better understand here we have the distribution of feeds

image

And the distribution of sources

image

To see how the trend is influenced by the algorithm we need to understand also how Maldonado's feeds are impressed. So here is a table where is possible to how feeds are divided in user's timelines.

image

As we can see from the up here the distribution of feeds in the user's timelines results very homogeneous. Every post appears just once in each timeline and it does not seems to be a preferred posts, the only source with more impressions per timeline is 'Perfilcom'. This homogenuos trend is not present in the days after the 17-10. It seems that once the news start to be huge and the amount of posts grows, the bigger sources are preferred and so with a % of impression much bigger than small sources. From the table beside it is possible to see how, after the 17-10, some sources start to have a much bigger impact on the timelines than others. This sources are in particular: Todonoticias, Perfilcom, LaNacion, Pagina12OK.

image

As the two tables below show the impact of small sources on the timelines seems to be related to the total number of posts. If the story is not at the center of the public debate there is enough space on the timelines for all the sources, but once the news gather a huge impact on the public debate the challenge to appear on the timeline seems to be unfair. Big source often gain a huge presence on the users timelines

image

image

The difference starts to be even clearer if we focus on specific days. We choose the 12-10, the day before the 17-10 with more records, and the 18-10, the day the news about the discovery of the body was out.

image

image