data activism section - Githubissues

thanks @berli0z for the translation

Data activism

We are data activists. In the society of information, who has the power to disclose or hide data is also able to influence our understanding of society, and limit our agency. As the research that exposed the Volkswagen pollution scandal has shown, how one is collecting data is so important, that it might make a difference between a critical analysis and the press review of a company. We, as free thinkers and researchers, want to be able to collect data from the bottom, as users, from users, for users. In other terms, not just for ourselves. We build tools that allow other researchers, or users themselves, to understand how the algorithm is having an impact on their life: in this challenge, the attack is personalized.

So you heard Facebook has released some data to some researchers...

This week, news media published a story concerning Facebook "opening its data to academics for first time". Indeed, Facebook will allow researchers to study social network’s effect on elections but only from 2017 onward (link: https://www.ft.com/content/82e0fcfa-69e1-11e9-80c7-60ee53e6681d). In the official document (link: https://items.ssrc.org/social-media-and-democracy-research-grants-grantees/ ), many projects are concerned with the impact of Facebook in political discourse. Actually, those are not the first researchers to look at this issue. Before, other research groups obtained Facebook data and used them to try to answer quite challenging questions on the impact social networks have on society. In our vision, this announcement should be seen as an attempt by Facebook to meet the goals promised in the context of its scrutiny (where they promised to offer a better comprehension of the abuses on the platform during electoral campaigns).

Data from Facebook is not neutral.

Well, no data is neutral. What we want to address in this text it's that research so far has not been looking at the abuses of the platform, but just at the ones happening on the platform. Instead, the subject of our investigation is Facebook itself. The object of research, according to Facebook, are the people that use it in an "unethical" way (fake news, propaganda, spam, hate speech, and so on..). We believe that the data that Facebook provides carries the fundamental problem that it will not allow an understanding of the role of the platform itself and therefore to be able to attribute the proper responsibilities to each actor of this phenomenon. The data Facebook disclosed in the past has always been focused on the interactions that happen on its platform. This forces researchers to look at Facebook "with their own eyes". Interaction is measured in terms of likes, comments, reactions in general and other activities captured by apps (how much of this meta-data are protected is something we cannot know). The interactions that happen on Facebook are the result of what people want, PLUS what the algorithm and the interface suggest to do. Facebook constantly changes these parameters, to optimize the consumption or to realize experiments (like the sadly infamous "Detecting Emotional Contagion in Massive Social Networks" https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0090315 ). We are not able to know the algorithms that "created" the data of the subjects of the experiment: this variable is mixed with the other ones. This implies we are not understanding "Facebook", or "society", but just "how people used Facebook during that period". But, why would we care about that? We should understand the dynamics that influence masses, and how the powers involved create a specific imaginary, or the diffusion of a specific belief (true or not). Understanding this mechanisms is necessary to attribute the right responsibilities. Medium is the message, said Marshall McLuhan in a provocative way. Indeed, Facebook's UX drives actions through emotional messages, usually in a visual way, and especially as much as they stimulate reactions that are measurable as "engagement". This pushes in the direction of seeing "active interaction" as meaningful, but excluding the case that passive exposure to information plays a role. Nobody can understand if a "visualized" information has also been "consumed". Naturally, Facebook could have access not just to interaction data, but also to the behavior of users. We cannot expect Facebook to provide these data because that would imply a violation of GDPR (users wouldn't be informed beforehand that their personal data could be shared with third parties, as regulation requires). Publishing papers that use engagement as the exclusive metric of success or failure of a message or an advertisement reinforces the narrative of Cambridge Analytica and other data brokers. It stresses the fact that to have success in the social network, one needs to engage its users, and this sustains and legitimize the existence of a bot, user and likes market. Moreover, it reinforces the power of Facebook that, as monopoly, sells a promise of engagement to who buys its sole product: advertising, or "user's attention". But engagement is just a metric among many others, that is prone to (automated) manipulation, and that is a result of different variables.

We have a different approach

We are researchers, and as such, we should have a critical approach. That's why we cannot forget that the main objective of Facebook is that of maximizing the attention (or time spent) of the users and to "lock them in" the platform. This doesn't necessarily imply providing good quality of the information, of the the same content that is supposed to keep them attached to the screen. This ignores the impact that individual user experience have on circulation of information and on the perception of public discourse. That why we want to take a "braver path", that some have been hoping for, but somehow didn't succeed in [ https://knightcolumbia.org/sites/default/files/content/Facebook_Letter.pdf ]. We, as third parties, collect data independently of Facebook. This could allow researcher to elaborate new questions on these phenomena. These questions should allow us, as a society, to understand social networks without having to be limited to the data Facebook is deliberately willing to share. The interactions that take place on the platform are the result of three components of heterogeneous nature: content producers (that can be either disinformers, commentators, friend which which you're sharing content, bots or influencers and so on..); the algorithms, or platform logic (that decides what is becoming viral, what should be seen on top of the timeline, and in general decides what is relevant for you and what is not); and "society" of which we are part of. As such, we want to protect some categories and make others responsible, understanding which behaviors are useful to our growth. Unfortunately, it is difficult to take into account all of these factors at the same time. We work on a technology that allows use to isolate the algorithms, or the platform logic of Facebook, so that we can differentiate responsibilities of the company from the other phenomena involved. The analysis of the algorithm concerns the tecno-(or socio-)political sphere, a field of study that is being developed in these years and that is focused on the social impact of technologies. We think that by separating the variables, social scientist and other experts are the right people to judge the most human factors at play. Being able to provide data to those in order to support data-driven policy-making is our objective. That's why we must recognize the multidisciplinary nature of the analysis that should be done and develop technologies that facilitate the reuse of data, as much as the protection of personal data (that constitutes the basement of our technology) allows. As soon as we have data that finally show us the effects of the algorithm, we are able to discover new metrics and formulate different and more contemporary questions about social networks. In a research during the past year, for example, we developed two ways of measuring the algorithm: one is the percentage of content by media type (text, video, pictures) a profile gets exposed to. We documented and published our tests, released open data. It is interesting to note how the data that Facebook released to its researchers never allowed for such analysis. Another metric is the amount of times a post is being showed to the same user. It is clear that if a certain number of contents is being shown again and again, the chance that the user will interact with it (and therefore, to engage with it) raises, while this repetition subtracts to the user the possibility of being exposed to more diverse information.

The fight against the abuse of political ads was to be done in 2015, now what?

Because of the media coverage and the imaginary created by the triad "Brexit, Donald Trump and Russian Bots", most of research is now focused on disinformation, bots and political advertising. Having had the EU Parliament as well as the American and UK Congresses interrogating Mark Zuckerberg regarding political advertisement, as well as having had two years of activity by Facebook PR about not allowing anymore political propaganda abuse, it's a sign that an institutional effort, as well as a response by the company, already took place. Today, if a significant antagonist wanted to abuse the platform, it would have had three years of time to discover new ways/tools to do so. The promise of Facebook to offer protection for users is not unconditional, but it's instead focused on the problems that this political class understood and for which it wants to try to keep Facebook accountable. Data should be protected because it's clear that, if they were publicly accessible, they could be employed in further abuses. Furthermore, we need to protect them because most of the companies and businesses around data analysis are focused on studying the behavior of users for targeted advertisement or marketing purposes in general. But it's not acceptable either how Facebook disclose their data to researchers. How can anybody be sure that they are correct? How can they be evaluated by the subjects themselves? Which procedure of refinement or selection did Facebook apply when extracting the dataset? And moreover, why should we trust a single group of researchers to elaborate inputs that will inform the development of public policy? According to the "European Data Commons" [section 2.2.2 of https://diem25.org/wp-content/uploads/2019/03/Technological-Sovereignty-Green-Paper-No-3.pdf], data are available to whoever wants to make use of them for research, as long as the logic of querying of the database is privacy-preserving. It's not easy to "formalize" this as code, so this requires an assessment of impact for every method of query of the database. What must be guaranteed is the possibility to analyze phenomena, but not individuals. We succeeded in building these mechanics, although we must consolidate a method to guarantee this protection and at same time promote research without interfering with it. With a bit of abstraction, we understand that the field of conflict is the freedom of users in being able to exercise their will. A problem of self-determination. We become used to accept that a small part of our information is advertised content, but what we want to say with our analysis of the Newsfeed algorithm is that the all of its content is organized to be propedeutic to its advertisement system. As a consequence, we should not consider it as the product of our will. Even if we did freely decide our friends and our pages, the freedom of facebook in showing us what it wants is way more prominent. We should apply this considerations to every platform that "personalizes the experience". The field of struggle becomes the personal curation algorithms. Hate speech, political propaganda, misinformation: these are only a few of the possible objects of research that we can use to address the root cause. That's why we cannot be surprised that other mechanics (such as friend recommendations, content becoming viral, number and types of comments selected to appear) could be controlled by Facebook or other actors (with their political agenda) that have the knowledge and the technological means to do so. Among our initiatives there's also the research of alternative methods to access information and public discourse, such as fbtrexRSS app that allows to see content in relation to its context (or semantic topics) as long as they are "of public interest". This is done in an horizontal and completely independent from any "filter bubble". The algorithm, even if a really simple one, is under the control of people. Even if this is out of the scope of research, we want to promote creative re-usage of data, as long as it is privacy-preserving. This guarantees a wider variety of interest and therefore more diversity in the representative sample (a limit recalled here: https://parameters.ssrc.org/2016/07/there-arent-any-rules-on-how-social-scientists-use-private-data-heres-why-we-need-them ).

Note: the RSS application is not a part of this research, but instead of our free software project that is growing at its side. If the software is protected by a collective license, its easier that it can guarantee the collectivization of these data and a collaborative and more complete revision of it.

tracking-exposed / eu19

data activism section #21