memory activation and decay

draggett commented 3 years ago

Unlike conventional database systems, cognitive agents just want to recall what is most important based upon experience. This is similar to web search engines which seek to provide the results that are likely to be most relevant given the words in the search query.

Forgetting as intrinsic memory decay or as interference from other memories, or some combination of the two?
Exponential decay over time provides good fit to lab data
But can also be ascribed to interference from new memories
Priming effect on related memories as spreading activation
We can recall memories from many years ago given the right cues
New memories lack strong evidence for their lasting value
Such evidence has to be acquired with experience
What’s the most effective model for all of these points?

Underwood (1957) showed that memory loss is largely attributable to interference with other memories. Memories can thus be recalled after an interval of many years provided that the interference is small. This reflects experience in selecting memories that have been more valuable.

For ACT-R, the decay of activation is only one component of the activation equation. There is also a context component to activation which works to increase the activation of items based on the current context. Thus, even chunks which have decayed significantly over time can have activations above the threshold if they are strongly related to the current context.

Proposed approach for the chunks specification

Chunks have parameters for an activation level and a timestamp
Activation decays over time like a leaky capacity losing its charge
Recalling or updating a chunk boosts its activation level
Boost is weaker for closely spaced rehearsals – aka the spacing effect – and is based on the Logistic function
Decaying wave spreads through linked chunks to boost related concepts
Stochastic recall - chunks with higher activation levels are more likely to be recalled, but sometimes weaker chunks are recalled in place of stronger chunks

Spreading Activation

Why is it easier to remember items in a group for groups with fewer items?
A wave of spreading activation provides one possible explanation
Activation of one item in the group spreads to other items in the same group following property links in both directions
The amount of wave activation for each item is inversely related to the number of items in the group
What is the underlying computational model for pulsed neural networks?

Here is an example:

# items belonging to group animals
item {word dog; group animals}
item {word horse; group animals}
item {word cat; group animals}

Remembering the item for dog boosts the chunk for the group (animals) and spreads out to boost the other items in that group
Does this depend on the property (in this case group) being the same?
How can we implement this efficiently on conventional computers?

One implementation strategy is to have one index mapping from chunk IDs to chunks, and another index from chunk IDs to the set of chunk IDs for chunks that have the given ID as a property value. A further index maps chunk types to the set of IDs for chunks with that type. This requires care to ensure that the indexes are kept up to date in respect to adding and removing chunks from a graph, as well as when the chunk type or chunk properties are updated.

Here is an implementation in JavaScript:

    // To mimic human learning and forgetting, the activation
    // of a chunk is modelled as a leaky capacitor where charge
    // is injected each time the chunk is recalled or updated,
    // and then decays over time. Related chunks are primed with
    // a fraction of the injected charge being divided across
    // linked chunks in a wave of spreading activation until a
    // cut-off threshold is reached.

    // This algorithm follows links unidirectionally from
    // properties to values, and needs to be extended to work
    // bidirectionally using an new index that lists chunks
    // with a type or property value equal to the given ID

    graph.activate = function (chunk) {
        // parameters for spreading activation
        const base = 1.0;
        const fraction = 0.5;
        const cutoff = 1E-5;
        const tau = 60000;  // arbitrarily 1 minute as mS

        // The spacing effect is that massed presentations have
        // reduced novelty, and are less effective for learning.
        // The logistic function is used to mimic the effect,
        // mapping the time interval since the chunk was last
        // recalled or updated to the boost in its activation,
        // see: https://en.wikipedia.org/wiki/Logistic_function
        function logistic () {
            return (1 + Math.tanh(x/2))/2;
        };

        function prime (chunk, boost) {
            chunk.activation += boost;

            // spread activation through linked chunks
            if (boost > cutoff) {
                // determine list of linked chunks
                let chunks = [];
                let props = chunk.properties;
                for (let name in props) {
                    if (props.hasOwnProperty(name)) {
                        let id = props[name];
                        if (typeof (id) === "string" && id[0] !== '"') {
                            let c = graph.chunks[id];
                            if (c)
                                chunks.push(c)
                        }
                    }
                }

                // prime the linked chunks
                if (chunks.length) {
                    boost = boost*fraction/chunks.length;

                    for (let i = 0; i < chunks.length; ++i) {
                        prime (chunks[i], boost);
                    }
                }
            }
        }

        let now = Date.now()
        let boost = base;

        if (chunk.timestamp)
            boost *= logistic(Math.log((now - chunk.timestamp)/tau));

        chunk.timestamp = now;
        prime(chunk, boost);
    }

    // used as part of stochastic recall of chunks where
    // where stronger chunks are more likely to be selected
    // This implementation uses the Box–Muller algorithm
    graph.gaussian = function (stdev) {
        const epsilon = 1E-20;
        const TwoPI = 2 * Math.PI;
        let u1, u2;
        do {
            u1 = Math.random();
            u2 = Math.random();
        } while (u1 < epsilon);
        return stdev*Math.sqrt(-2*Math.log(u1))*Math.cos(TwoPI*u2);
    };

Chunk recall first identifies matching chunks and for each match, applies gaussian noise to the chunk's activation level, and selects the matching chunk with the highest resulting score. The selected chunk is activated as above. Selection fails if the score is below a given threshold.

The gaussian distribution is centred around zero and drops off for negative and positive numbers. The graph.gaussian function above on average returns values close to zero, and more rarely large negative or positive numbers.

gaussian distribution

To apply gaussian noise to an activation level, multiply the level by e raised to the power of the noise value computed from graph.gaussian. The standard deviation should be a system wide constant.

For the memory test task, the successfully recalled items in the test are treated as an iteration (see @do next). Rules then have access to the number of items recalled as well as to the sequence of items. Items may failed to be recalled if their activation level is low, or if the stochastic noise depresses the score below the threshold.

Summary

Human memory is functionally modelled in terms of a graph of chunks where each chunk is associated with an activation level and a timestamp. Activation decays exponentially with time (like a leaky capacitor), but is boosted by recall or update, and via spreading activation in both directions through links between chunks. Recall is stochastic with noise being applied to the chunk activation level before comparison with a cut-off threshold.

draggett commented 3 years ago

Further consideration suggests that spreading activation should not be performed when adding a chunk to a graph. In respect to the group of items example, if spreading activation was done when adding the chunks, then the oldest items in the group would get higher activation levels than the youngest items, which seems wrong! It therefore seems better to only perform spreading activation when updating or accessing an existing chunk.

It should be safer and easier to use getters and setters for chunk properties when it comes to maintaining the index from chunk IDs to chunks with that ID as a property value. A counter argument is that it might be better to use explicit functions as this will make it easier to port the chunks library to programming languages that don't support getters and setters.

The index is only updated if the chunk graph property is defined. Some possible complications:

Iteration over chunk properties would need to be handled differently as JavaScript's built-in for in wouldn't work.
Updating chunk buffers shouldn't alter chunk activation, and a safe way to handle this is to set chunk.graph to undefined when copying a chunk to a module chunk buffer.
Spreading activation should not be invoked when creating chunks during parsing.
Parsing needs to initialise the activation levels, timestamp and indexes.
Serialisation needs to handle the activation level and timestamp as the inverse of the parser processing.
@do next needs to apply stochastic selection prior to pushing chunks to the chunk list, see graph.doGet.

It is likely that additional and more complicated indexes will be needed to support finding chunks that match the chunk type and properties in chunk buffers and rule conditions. Further work is anticipated as part of scaling up chunk database performance for larger databases.

draggett commented 3 years ago

For memories that once were really active, but haven't been used for a long time, these may still be of value in the right context. The above approach relies on spreading activation to boost old memories, but is that always sufficient? One potential approach would be to also keep track of how many times a given memory has been used. How should this be combined with the decaying model of activation in respect to memory recall? How could we design experiments to test this with human subjects?

tidoust commented 3 years ago

Discussed today during Cogai CG call. Starting point would be to add a @strength and @age internal properties, where @strength is a number that e.g. starts at 1 and can be increased/decreased over time (through activation and spreading), and @age is the time in "seconds" since chunk was last accessed/updated, etc.

draggett commented 3 years ago

The current version of the JavaScript library uses @ago in place of @age as a better match to the idea that a chunk was last updated the given number of seconds ago before the graph was serialised. In the graph API, this is exposed as chunk.lastAccessed, which gives the number of seconds ago that the chunk was last accessed or modified.

@use indicates the number of times a chunk has been accessed modulo the spacing effect, i.e. it is one plus the sum of the boosts the chunk has had, each time it was activated. The boost is based on the logistic function, tending to zero for short time intervals, and to one for long time intervals. Accesses that are close together are a weaker indication of long term utility than when they are further apart. This is exposed via the graph API as chunk.usage and models the long term utility of a chunk.

Consider animals that need to adapt their behaviour to the annual seasons. Knowledge needed for each season lies dormant for much of the year. The decay rate for chunk strength is inverse proportional to chunk usage. This balances the needs for short and long term memories, and reflects underlying changes in synapses due to neurotransmitters and neural plasticity.

w3c / cogai