nazaruka / gym-http-api

NSGA2-based Sonic agent + experimental code
MIT License
1 stars 1 forks source link

Three Evolution Modes: No Learning, Baldwinian, and Lamarckian #30

Closed schrum2 closed 5 years ago

schrum2 commented 5 years ago

We should have three modes of evolution we can compare.

No Learning: If a global or command line parameter indicates that no learning should occur, then simply do not execute the learn method when evaluating a member of the population. Skip straight to evaluate, which doesn't change the weights.

Baldwinian: This is what we are currently doing. Load the genome weights, change them via learning with PPO, but then discard the learned weights and pass the original evolved weights onto offspring if the agent performed well enough after learning.

Lamarckian: If a global of command line parameter indicates that Lamarckian evolution should be used, then after executing the evaluate method to get fitness and behavior characterization, copy the learned weights from the CNN back into the solution genome. Parents pass the benefits of learning on to their offspring.

nazaruka commented 5 years ago

Added the --evol-mode argument, which takes only the values none, baldwin, and lamarck. Since Lamarckian evolution has not been implemented just yet, I revised the code in the evaluate_population method to read as such:

def evaluate_population(solutions, agent):
    ...
    for i in range(pop_size):
        ...
        if args.evol_mode in {'baldwin', 'lamarck'}:
            weights = torch.from_numpy(solutions[i])
            weights = weights.type(torch.FloatTensor)
            set_weights(agent.actor_critic, weights)

            print("Learning.", end=" ")
            learn(envs, agent)

So none essentially neither sets new weights nor learns, while baldwin and lamarck do so.

If you want to run the small test I did, you can execute python NSGAII.py --evol-mode none --num_gens 1 --pop_size 3 --use-gae --num-processes 1 --num-steps 128 --num-mini-batch 1 --no-cuda --use-proper-time-limits --recurrent-policy (NB: this crashes with the same type issue we have been working on.)

nazaruka commented 5 years ago

Implementing copy_weights was successful at first, but it started giving the same issue of weight type when running the code. Fortunately, the easy fix of line 326:

cnn_weights = np.concatenate((cnn_weights, layer_array)).astype(np.float32)

ensured that the method kept working with the genomes' children. Command line parameters work without problems.