tournesol-app / tournesol

Free and open source code of the https://tournesol.app platform. Meet the community on Discord https://discord.gg/WvcSG55Bf3
https://tournesol.app
Other
324 stars 47 forks source link

(proposal) proposed updates for the first paper related to hyperparameter choice and uncertainty #1260

Open sergeivolodin opened 1 year ago

sergeivolodin commented 1 year ago

I finally understood how to describe what worried me about hyperparameters so much two years ago, a bit in a more formal way than before. I did not describe it well then, and I feel it's important to do it now, because it showcases a specific area where I feel there could be separate line of research, and I also feel the paper could include that, as otherwise it may provide a false sense of security.

The first paper presents Figure 8 that shows how different model hyperparameters result in different videos being ranked at the top (https://arxiv.org/abs/2107.07334). The paper then explains that in the limit of infinite data that is less of a concern. I agree.

However, there are specific cases in information propagation that are at the same time important, and at the same time do not have a lot of data. Specifically, novel events or things.

Example 1: the "lab leak" COVID theories were common during the pandemic, first dismissed then formally investigated. There was no reliable data about that for a long time - simply because the proper research takes time

Example 2: imagine some newly created research nonprofit claims that some country created a military satellite with a powerful laser that can destroy objects on the ground. We don't know if it's true. There are many uncertainties - how reliable is this nonprofit? There needs to be analysis of its incentives and funding sources by a person who has knowledge in that. How likely that such lasers are possible? There needs to be an analysis from a person who's knowledgeable in lasers. How likely is it for this country to use it to dictate its terms? There needs to be analysis from a person who understands politics.

Before proper research and analysis is done, we as humanity don't really know how true is the video talking about a novel event. But we still have to make a decision on whether to recommend this content. In the example 2, there are downsides to wrong decisions in both cases. If the satellite exists and the country is planning to use it maliciously, people need to take shelter. In case if the satellite does not exist, unnecessary panic will disrupt life and create suffering.

It's important here that we cannot just "wait it out and decide whether to recommend the video later": indeed, if the video does not become popular, it's unlikely the people who can do the proper research and analysis will do that, simply because they don't know. In example 1, if all videos with the "lab leak" hypothesis were de-recommended, it would be less likely that this hypothesis would have been properly investigated.

I wrote about this type of uncertainty in a document linked below, showing that it's important to have different viewpoints (I feel that computer science's theories of decision making often disregard cultural aspects, so I'm adding this to have more context). Sometimes it's not that important if a particular view is well-presented, well-argued for or well-formed. Sometimes a poorly presented view sparks interest in a topic and leads to a proper conclusion with good presentation and argumentation. From the point of view of a single bayesian reasoner that view might be pointless, but in the context of cultural learning it makes a lot of sense. See https://docs.google.com/document/d/1MD_SmyMVEvoS42Fkgcn2VdblCy3l1gKFY1iq62WXWYA/edit?usp=drivesdk

Simply put, these first decisions on whether to recommend a video when there's unusually high uncertainty do affect the future: whether that uncertainty will be reduced by doing more research and which action shall people take.

This is a bit different "mode of operation" than Tournesol does: reliable judgements based on a lot of data, when analysis and research is available. This high-uncertainty recent events mode for recommendation algorithms is also important, and I believe cannot be left out when creating a truly better recommendation algorithm.

I was really worried about this few years back, because at the same time I felt that such decisions during the first days after something novel happens are extremely important, when seeing and reflecting on posts online about politics and everything, and at the same time felt dismissed when voicing those concerns.

It is true that with more reliable data the algorithm will uncover the truth, just, the existence of such reliable data depends on how the algorithm makes decisions when no such data is available :)

I understand it's a bit out of the scope of Tournesol's mission (collect the dataset for cases when data based on research and analysis is available), yet I feel this topic is important and could be discussed in the paper.

Otherwise I feel that the paper provides a false sense of security by arguing that "with more data the problem of hyperparameters disappears". For some cases it's not applicable, specifically, for the case described above when there's uncertainty on whether it's a good idea for this video to go viral, and the decision affects both immediate safety of people, and whether more research will be done to reduce the uncertainty and have more data.

Thank you.

glerzing commented 1 year ago

Hi Sergia,

I'm not from the core team, but from what I heard we may make a new version of the white paper someday that we will try to publish. But currently we use Mehestan and bayesian statistics to aggregate the scores (as described in https://arxiv.org/abs/2211.01179), so I'm not sure Licchavi will be appear in this new paper (it should be mostly about the dataset), and I don't think the old paper will be updated.

In the current algorithm, since the final score is like a median (biased towards 0 especially when there are few contributors) of all the ratings, it really depends on the opinion of the majority. The first raters for a video are usually positively biased, notably because they follow the video's channel. Then if the video does not look reliable or having important positive social impact, it usually gets bad ratings and looses visibility.

For example, people that follow political far-left channels may give high ratings to their videos, which gives them visibility for a moment. But when they reach a more general audience, those videos tend to get negative scores and loose visibility. The fun videos about unimportant subjects may follow the same pattern. While videos about ecology, social medias or against meat consumption usually continue to get good scores, so ultimately, the top recommendations are important and reliable according to our contributors but not so diverse. Speculative videos are not necessarily considered unreliable if they have a good rationale, are not overconfident... (e.g. https://tournesol.app/entities/yt:GuTgfnkILGs).