Computing the confidence of informations

alexis- commented 4 years ago

ErgonomicFugitive> What you're asking for is bayesian probability analysis, though that's always going to be highly approximate when you don't have access to the data sets themselves. Short of that, learning to interpret various measures of effect size would pay off well. You also need to identify the particulars of the research methodology to be able to assess how well it will transfer to your own needs. I don't know how familiar you are with research methods, but knowing a wide range of methodologies makes scientific papers much more coherent and thus readable and eases extract selection. If a paper is still incoherent, your best bet is to look through the technical terminology in the paper and learn that deeply first. That should also help you build a causal model to assess transferability.

The best safeguard against irreproducible research is to switch from a black-and-white model of information accuracy to a bayesian probabilistic one. In such a system, your estimate for the likelihood of a hypothesis being true is adjusted proportionally to the volume * quality of data, which in turn is inversely proportional to the likelihood of retraction. I've been toying with the idea of including rough probability distributions in my items, but since I already do that in my head it's a lot of effort for little return.

AlexisInco> @ErgonomicFugitive Dealing with moving bodies of knowledge is problematic in SuperMemo. Your approach to estimating the confidence of a particular information (= item in SuperMemo) is interesting and is probably already one of the most efficient way to deal with this. The issue is, of course, how to minimize the cost of applying it ? And what information is worth the additional cost ?

I am starting to develop several project and plugin ideas dealing with research in SuperMemo. I think your idea could be taken further and optimized by automating certain steps:

For this to be efficient, information would need to be targeted, e.g. NFTs ≫ AβPs correlate strongly with cognitive impairment.
A plugin could then keep an index of every information being tracked
Upon adding a new entry, the plugin would create or associate an item in SuperMemo. The answer would contain the confidence data (e.g. confidence, number of data points, etc.)
Every time a new piece of information is presented to you, you would press a key to open a popup, type some keywords to search the index, and add the new value
The plugin would automatically update the confidence for that particular information (Bayes, or whatever other method you've chosen when creating the entry -- the plugin could implement several algorithms)
The SM item associated with that entry would then be updated with the confidence data
Each new value could be associated with several metadata (e.g. which topic id does it come from ; the source, title, url from the topics's references ; etc.)
The index could have some sort of classification to help organize the data (maybe automated by using the KT)
New values could be associated with a weight (e.g. if it is taken from a meta-analysis)
That still leaves the issue of possibly overlapping data (e.g. if you add the result of a meta-analysis, and that meta-analysis is based on at least one paper you have previously indexed in the plugin)

alexis- commented 4 years ago

ErgonomicFugitive> A semi-automation of inference could be accomplished since probability-based statistics is largely algorithmic (especially compared to frequentist stats). A plugin that handles that could take all the inputs required for bayesian inference and run calculations accordingly. There would need to be some way to limit the CPU usage and duration of probability updates since they can sometimes grow way out of control, and it would be quite annoying for supermemo to lock up while it's running those calculations. For problems where not all inputs for bayesian inference are available, the principle of maximum entropy can be applied instead. If you want to get really ambitious, there are tools for automatically parsing natural language to find claims, as used in http://www.vldb.org/pvldb/vol8/p938-dong.pdf There's also a public release of a powerful natural language AI, GPT-2: https://github.com/openai/gpt-2

alexis- commented 4 years ago

Throttling option: https://github.com/tom-englert/Throttle.Fody

supermemo / SuperMemoAssistant

Computing the confidence of informations #93