gjacobrobertson commented 8 years ago

This is a big Pull Request which encompasses a large number of changes and significantly refactors many part of the codebase. The overarching goal of these changes is to enable experimenters to more easily design new experiments and AI algorithms, primarily in the NERO environment. I will be attempting to document the many proposed changes here in subsequent comments.

gjacobrobertson commented 8 years ago

Design Goals

Loose Coupling

Python module and class responsibilities should be clear, contained, and only depend on the interfaces between components, not their implementations

Implementing new Nero agent AIs should consist solely of updating NERO/agent.py
There should be a consistent interface for team level AIs (such as RTNEAT)
There should be a consistent method of serializing and deserializing NERO agents and teams that is flexible to the design of new Agent and Team classes
The NERO environment should completely define the environment without reference to the NERO module.
The NERO environment should be agnostic to agent implementations
The NERO module should define an interface for interacting with the environment
The C++ RTNEAT AI should not depend on environment implementations

Flexible Experimentation

Module and experiment designers should be able to use existing algorithms in a flexible way

Experiments making use of RTNEAT should have complete control over the fitness function being used.

NERO Learning Improvements

Existing NERO agents should converge towards player defined goals reasonably quickly, and the user interface should provide meaningful insight into progress being made

The NERO environment should report weighted reward instead of raw fitness values. Reward weights are functionally environmental state.
The NERO environment should report bounded rewards for potential use in algorithms that require bounded rewards (e.g. RL techniques with neural networks)
An agent's reported "time alive" hint should not exceed its maximum lifespan.
For any non-zero setting to reward weights, agent fitness should be diverse amongst the population over time, i.e. the majority of agents should not have 0 fitness the majority of the time.
Reward values should be consistent across reward weight configurations. i.e. changing reward weights should not change the scale or bounds of rewards given, which might invalidate prior or current settings.
Agents should be capable of learning and evolving in a team vs. team battle scenario

Code Quality

Code should be readable and maintainable

Commented out code snippets, unused functions, and unreachable sections of code should be removed. That's what version control is for.

gjacobrobertson commented 8 years ago

Changes

NERO Agent Lifecycle

Any agent AI is completely encapsulated by a subclass of NeroAgent
RTNEATAgent has been renamed to NEATAgent which represents any agent that is backed by a C++ NEAT::Organism. It acts by feeding its sensors into the inputs of a neural network, activating the network, and interpreting the outputs as an action vector. It is agnostic to population level AI i.e. it can be used on its own, as part of RTNEAT, generational NEAT, or any other team.
agent.py provides factory methods for creating NeroAgent objects from string descriptions of AIs to be used. e.g. agent.factory('neat') will return a newNEATAgent`
Agent brains can be mapped to existing simulation bodies. For comparison, before in order to create a simulation body with a given brain AI, you would write an XML template named like steve-blue-rtneat.xml that contained information on how to instantiate a new RTNEATAgent class onto a blue robot model, and users would have to create a new template for every combination of 3d model, team color, and agent AI. Now you can use a template such as steve-blue.xml to create a simulation entity, instantiate an agent brain yourself, and then wire them together with a call to SimContext::InitObjectBrain

NERO Team Lifecycle

new NeroTeam class represents a set of agents in an environment and any population level AI for that team. It can be instantiated on its own to be a plain team of agents with no evolution or other population level methods.
teams.py provides factory methods for creating Team objects
- create_agent(agent_ai, *args) calls agent factory methods to instantiate a new agent and add it to the team
- create_agents(agent_ai) fills the entire population with newly created agents
- is_episode_over(agent) hook into the environment lifecycle.
- reset(agent) hook into the environment lifecycle
- kill_agent(agent) marks an agent as dead
- reset_all() resets an entire team to a playable state, resurrects dead agents, etc
- is_destroyed() returns True iff the team is non-empty and all agents on the team have been killed
- start_training() starts any training AI's
- stop_training() stops any training AI's
RTNEATTeam represents a NEAT::Population managed by the RTNEAT algorithm. It uses is_episode_over to hook agents killed periodically by the RTNEAT algorithm into the simulation lifecycle, and reset(agent) to replace organisms killed by RTNEAT. reset_all is used for generational NEAT to turn the population over the next epoch.
RTNEATTeam expects the agents on the team to be instances of NEATAgent (or any subclass thereof)

NERO Environment

NERO agents receive [0, 1] bounded single dimensional rewards. Combining multivariate rewards is the responsibility of the environment.
Distance based reward components (distance to flag, etc) are calulcated as 1 / ((d * d / c) + 1) where d is the distance being measured and c is a scaling constant the spreads the reward out over the scale of the field.
Rewards are combined to be a percentage of possible reward given a set of weights. e.g. if the distance to flag weight is 1 and the approach enemy flag is -0.5, the weighted sum of rewards is bounded by [-0.5, 1], which is then normalized to [0, 1]. At most 2/3 of the possible reward received can come from flag distance and 1/3 from enemy distance, maintaining the relationship between reward weights.
The NERO environment now completely contains all features of the environment, including reward weights and spawn locations.
The NERO module.py is mostly just an interface between the UI and and environment now.

Serialization

Using the "Save Team" functionality serializes a NeroTeam into JSON. Before different agents were not clearly disambiguated in their serialized from, and thus deserializing required operation of a state machine that looked for heuristic signatures of different Agent representations. For instance, the presence of the word 'genomestart' signaled the beginning of a RTNEATAgent serialization, and could not be disambiguated from any other Agent that used a NEAT::Genome as part of its serialized representation.
A team's JSON representation consists of two keys: team_ai and agents. team_ai is a key to a string that can be used by the team factory method to know which team class to instantiate, and agents is a key to an array of serialized agents.
An agent's JSON representation consists of two keys: agent_ai and args. agent_ai is a key to a string that can be used by the agent factory, and args is a key to an array of arguments that can be passed to the agent constructor to create an agent with the appropriate state. For instance, NEATAgents put 1 line of their NEAT::Organism representation into each arg, and read the args back into a stream that NEAT::Organism can be constructed from.
The C++ rtneat package had a lot of classes with methods like print_to_file(std::string &filename) which created a lot of redundancy and poor separation of filesystem concerns with serialization concerns. NEAT namespace classes have had these functions removed, and replaced with implementations of the << operator to write a serialization of an object to any std::ostream, and filename based constructors replaced with std::istream based constructors. In this way users of the rtneat package can handle file system operations themselves and flexibly serialize and deserialize individual components into whatever format they wish.
The C++ rtneat package had an incomplete and broken integration with Boost XML Serialization. This confused matters and has been completely removed.

RTNEAT Interface

The C++ RTNEAT class has been significantly simplified. It was previously the only interface between the OpenNERO simulation, the rtneat package,and python modules, and had many disparate responsibilities.
RTNEAT wraps a NEAT::Population that must be instantiated outside of RTNEAT
RTNEAT is an OpenNERO "puppeteer" AI that periodically marks a low fitness NEAT::Organism in its population for removal.
RTNEAT defines a method for replacing organisms killed by it
RTNEAT does not do any brain-body mapping. This was needlessly complicated anyway
RTNEAT does not do any file system operations
RTNEAT does not do any fitness evaluation. It expects up to date univariate fitness values to be assigned to each organism it is managing. Fitness function definition and multivariate reward combination is now the responsibility of the experimenter.
NEAT classes such as Population, Organism, Genome, etc have boost-python wrappers defined and can be interacted with directly by python modules without using RTNEAT
"lifetime" is not longer conflated as being RTNEAT minimum time alive and agent maximum time alive. minimum lifetime, maximum lifetime, and RTNEAT kill rate are now distinctly defined parameters.

Battle Training

the NERO_Battle module has been updated to make use of the changes made to NERO
reset(agent) in battle mode despawns and calls team.kill_agent(agent) instead of team.reset(agent). i.e. dead agents stay dead until the end of a battle.
The battle environment has a "global" tick hook into the simulation. It no longer has to attempt to detect and score a battle ending after each individual agent action, but can evaluate the state of the battle after each "round" of every agent acting. This is pretty similar to how it worked before actually, just less hackish.
When the environment detects the end of the battle, it reports the score, resets both teams and starts a new battle automatically.
The battle module imposes no restrictions on learning happening during a battle or evolution happening between battles. Unlike the NERO environment it does not call team.start_training or team.stop_training, allowing team implementations to distinguish between training AI and battle AI. For instance, RTNEATTeam runs the RTNEAT AI in training mode (killing agents periodically), but not in battle mode (that would be a pretty bad strategy). Furthermore it implements generational NEAT in reset_all() allowing it to evolve between battles.

gjacobrobertson commented 8 years ago

TODO

Update Roomba module for new RTNEAT interface
Update Tabular Q-Learning Agent for new interfaces.
BUG: user's moving team spawn points no longer works.
NEAT::Population::epoch can segfault when all organisms' fitness is 0. This is currently avoided by not calling epoch in that situation, but the epoch function is 500+ lines of code and could probably use some cleaning anyhow
Update Battle Mode user interface for setting reward weights, saving teams, etc
MAYBE: It's possible that Training and Battle no longer even have to be distinct modules, but could share a single interface, and include something along the lines of checkboxes as to what kinds of AI's should be running, e.g. Team Training checkbox, Agent Learning checkboxes, evolution, etc.
BONUS: If we unify battle and training into a single interface maybe we could completely lose Java as a dependency

nnrg / opennero

NERO overhaul #163

Design Goals

Loose Coupling

Flexible Experimentation

NERO Learning Improvements

Code Quality

Changes

NERO Agent Lifecycle

NERO Team Lifecycle

NERO Environment

Serialization

RTNEAT Interface

Battle Training

TODO