Create basic brain pipeline

Create the basic pipeline for an AI brain.

This is going to get quite complex down the road, but we'll start with a fundamental core around which we can build specialized modules. Initially we may approximate or ignore intended functionality in order to rapidly prototype in a holistic fashion.

Webaverse is filled with AI of varying intelligence, complexity and specialization. Our agents should be continuous, adaptive, self-organizing, goal-oriented and self-preserving. Phenomenon like social organization should happen organically through natural equilibrium due to shared goals.

Coordinating multiple AI agents will touch on many disciplines such as game theory, decision theory, physics, statistics, philosophy, and cognitive science. Successfully executing our grand vision of an AI-empowered metaverse doesn't require mastery of these subjects, but a working knowledge provides a holistic framework for complex multi-agent dynamic systems.

The core model is rooted in the concept of embodied perception-action loops, according to the free energy principle. This principle models the behavior of coupled embedded systems. An embedded system is a system which exists inside of a larger system (like an agent in an organization). Coupled systems are systems whose outputs affect each others internal state, and thus behavior. A classic example of a coupled system is the solar system, in which the gravity (output) of each planet affects the trajectory (behavior) of each other planet.

The essential idea is that embedded agents seek to model the state of other coupled systems, such as the environment and other agents. The principle of least action emerges, as describing the state of other coupled systems allows an agent to properly predict, measure and react to a quality known as surprisal. Surprisal is the level of surprise at a given outcome, also expressed as the difference between expected and observed states.

A system which perfectly predicts a subsequently observed state can react using the least amount of energy. An example of this phenomenon is how your body calculates ahead of time how much force to apply when picking up an object. The heavier you predict the object to be, the more force you will apply. Have you ever misjudged this and went to push on a door that turned out to be very heavy? Or slammed a door shut accidentally because you didn't account for air pressure differentials caused by the A/C unit?

The extreme of this is known as the good regulator theorem. You need to regulate either your environment, or your behavior, in order to spend the right amount of energy whenever you open the door. You do this by evaluating beliefs and desires. Your model of the world emerges from these beliefs. Systems which do not care to optimize this spent energy quickly lose to more conservative systems. The ideal balance between available and spent energy rests at the edge of chaos.

If you slam open and shut every door you encounter, you will tire faster than others, and your environment will fall apart. This causes a runaway effect where it takes a continually increasing amount of energy to navigate the environment (such as the door frame breaking, increasing the strength required to operate the door). Now imagine if you also didn't model the location of the door and had to find it each time, or if you didn't model how to estimate distance given a visual image of the door.

This leads to an evolutionary advantage for selecting systems which minimize variational free energy, and thus perfectly model their parent systems. This is why animals have developed eyes, ears, tongues and such things which allow us greater statistical understanding of our environment. It's why we have developed hands and big brains, which allow us to organize, act on the world, and make long term plans which overcome entropy. It is why our brain has structures such as grid cells.

So why does all this nonsense matter for our AI models?

Well, imagine two AI agents engaged in conversation.

A naive implementation might have each agent generate responses in turn by evaluating a combination of recent conversation history and a global, shared context such as semantic labels of the environment and other agents. This will produce interesting results, but it's non-adaptive. The agents are not coupled, and their interaction does not change each other's long-term state.

A better implementation would be one where each agent maintains a model of the environment and other agents it knows about.

Instead of Alice making decisions based on semantic labels like

Bob has a mean temperament.

which cannot be modified locally and are totally decoupled from Alice's own experience, Alice makes decisions which combine global context with local models:

Bob is known for a mean temperament, but I think they are okay, because of the time they showed compassion for my friend Carol.

This highlights several models at work.

Alice understands that Bob has a reputation, but also factors in the difference between this reputation and their own observation, which ideally leads to a more accurate prediction of Bob's state and behavior.
Alice tracks their relationship with both Bob and Carol, as well as Bob's relationship with Carol. This includes knowing everyone's model of everyone else. This kind of relationship modeling has been studied in rats and other social species.
Alice makes two value judgements toward Bob: their usefulness to Alice (and thus Carol), and their moral standing. These judgements also depend on Alice's value judgements toward Carol: Here, Bob is only accepted by Alice because Bob is useful to Carol, who is important to Alice in some way.
A model for emotion/behavior, which leads to concepts such as compassion.
A model for social topology which advises Alice on how to shape their perceptions and actions to efficiently navigate encounters between Bob and Carol given these models.

So, we want to reduce our surprise between expected and observed outcomes, and we do that either by updating our model of the world (modifying a belief) or modifying reality to match our model by acting based on a desire. At a high level, the loop is as such:

Observe state of world via inputs (fed from the environment and other agents)
Modulate observed state by emotional state (sensory data is modified by states like anger, which itself arises from high degrees of surprise) to distill a perceived state.
Compare perceived state to internal models and determine surprise.
Determine course(s) of action to reduce surprise.
Modulate emotional state by action model (actions can affect internal state, for example: "calm down and focus")
Apply action model to the external world and internal models.
Observe new state of world and repeat.

This is an incomplete overview, but it's a good starting point. The next step is to break down each of these components and understand how they work in more detail, breaking out tasks to achieve the stories we wish to support.

This issue tracks the implementation of a basic pipeline which can eventually support these features.

webaverse-studios / webaverse

Create basic brain pipeline #111