mindbeam / mindbase

A database for convergent intersubjectivity
Apache License 2.0
21 stars 2 forks source link

Discussion of the FuzzySet Signal-To-Noise-Ratio Problem #8

Open dnorman opened 4 years ago

dnorman commented 4 years ago

from #7 we discussed the FuzzySet Signal-to-noise-ratio problem

One thing within the experimental code which is almost certainly wrong is the way unions are being performed across the output of each candidate Analogy interrogation. We must explore a more appropriate means of composing these candidate Analogy interrogation outputs in a weighted fashion, rather than simply taking the maximum degree of each discrete matching member into the final output FuzzySet. This is screwy, because we likely don't want Members from a small subset of candidate Analogies with a high degree of matching to compete on equal footing with a corpus of thousands with a low matching degree, as a simple maximum-degree of membership union might provide. (current code does this) However, we also don't want to attenuate the signal of such a well-matching subset of candidate Analogies as a simple weighted score would suggest either. Presumably there is some middle ground which must be found, whereby these considerations are balanced. Not a simple weighted score, and not a maximum-degree of FuzzySet membership either.

Let's imagine we interrogate three candidate Analogies and we are left with the following Symbols, which we are constructing manually here, but would be typically be analogy interrogation outputs created by the query tree.

    let io1 = sym![Hot1~0.3,Sticky1~0.2];
    let io2 = sym![Hot1~0.5,Muggy1~0.9];
    let io3 = sym![Hot1~1, Sticky1~1];

    // union the interrogation outputs together
    let u = Symbol::null();
    u.union(io1);
    u.union(io2);
    u.union(io3);

What do we want to have in the end, and why?

Should include the max of each degree?

    u is [Hot1~1,Muggy1~0.9,Sticky~1]

This doesn't seem very good. We want small signals to be boosted, but this might be a bit too much

Hot 1 is present in all input Symbols. Should we average them?

    u is [Hot1~0.6,..]

What about Muggy1, and Sticky1 - which are only present in some of the inputs? Should we treat the sets which lack them as degree 0, and include those in the average?

    u is [Hot1~0.6, Muggy1~0.3, Sticky1~0.4]

Or should we average them individually based on their non-null set membership?

    u is [Hot1~0.6, Muggy1~0.9, Sticky1~0.6]

Let's take a step back. What do each of these input symbols represent?

Each symbol represents one side of an analogy which a trusted (ground) Agent previously Claimed. Each member of which had its degree determined by some prior query, presumably by that Agent, wherein a partial match of claims was had.

This could come about a number of different ways, but the simplest construction of events is:

a1hot : Symbol = Agent1.query("Hot");
    // TODO - construct a full chain of events (including genesis Claims) by which Symbol members of a degree <1 are constructed, and then Claimed as new Analogies
    // From there we can determine the most prudent implementation of union, such that we optimize the signal-to-noise ratio