Implement MOME - Githubissues

schrum2 commented 1 year ago

This idea comes from this paper: https://arxiv.org/pdf/2202.03057.pdf

It will be a challenging issue requiring a lot of work. The idea is to combine MAP-Elites with NSGA-II. NSGA-II is a multiobjective evolutionary algorithm that my code contains an implementation of. You can read about NSGA-II here: https://cs.uwlax.edu/~dmathias/cs419/readings/NSGAIIElitistMultiobjectiveGA.pdf However, it is probably better to just talk to me about it.

The idea behind MOME is that it is like MAP-Elites, except that each cell of the grid contains a Pareto front instead of a single elite solution. A Pareto front is a collection of solutions, where none is strictly better than any of the others in a multiobjective sense. This means that MOME needs to adapt MAP-Elites to support a sub-population within each grid cell. However, I don't think MOME can extend the MAPElites code in the project, since it is designed for just one solution per bin. The same is true of the Archive class.

So, make a new package edu.southwestern.evolution.mome that contains the MOME class and MOMEArchive class. The MOMEArchive will be similar to Archive except that each grid cell can contain a variable number of individuals, and MOME will be like MAPElites, but with the same added flexibility. Note that the MAPElites class has a lot of accumulated code that we won't be trying to copy or adapt, at least at first, We want MOME to be simple and streamlined at first, so when adapting code from MAPElites, don't worry about logging or about weird special cases if you can avoid it.

We might find a way to unify these classes down the line, but for now they will mostly be standalone.

From the NSGA2 class, you'll want to make use of the getParetoFront method and the getNSGA2Scores method

schrum2 commented 1 year ago

Make it so that MOME<T> implements SteadyStateEA<T> and Eclipse will suggest a fix for adding the methods that you need to implement. This will provide a minimal implementation of the code.

JoannaBlatt commented 1 year ago

Was able to run an experiment from main. It ran for about an hour and produced one flying machine. Currently seems to work fine. Did not do a successful maven build.

schrum2 commented 1 year ago

I'm running the MOME test now, though my latest commit did fix some minor issues.

Do a test with two objectives. After merging my branch, add the command line parameter that sets the maximize volume fitness to true, and then run that.

In the meantime, while that is running, start working on some basic logging. See if we can just log the number of occupied bins first (the number of Pareto fronts/sub-populations). Another thing to log is the overall number of individuals in the archive (probably need a method for this).

There is more, but start here.

JoannaBlatt commented 1 year ago

I got logging to work. It now puts information into a text file and updates based on the number of generations

schrum2 commented 1 year ago

Here are some more things to log.

Max fitness in each objective
Min fitness in each objective
Max sub pop size across all bins
Min sub pop size across occupied bins (ignore empty ones)
Max hyper volume in one bin
hypervolume across all bins
size of combined Pareto front across all bins

schrum2 commented 1 year ago

I did a simple run with one objective (in theory, equivalent to plain MAP Elites) and it produced flying machines

schrum2 commented 1 year ago

After resolving #916 my batch file for this seems to run. @JoannaBlatt should merge with my branch before doing any additional work. Although my test could reveal some errors, there is still work to be done even in the absence of errors.

The initialization of MOME needs to write the gnuplot files that will display results from the logs.

More logging is needed. I believe there is already an archive log variable that isn't being used. Here is what it should track ... although it actually might make more sense to have several separate logs for each of the pieces of information I'm about to describe.

First, understand that for regular MAP Elites, the archive log stores the scores of all elites, but it also has a column for every bin in the archive. That means that is has a lot of X marks (I believe this is how it is encoded) indicating an empty cell. Then the fitness gets filled in as the rows go down. In MAP Elites, this file is how both the final 2D heatmap archive visualizations are possible, and how the linear heatmaps across time are made (the ones that fill up from bottom to top). Clearly, it would be beneficial to make visualizations like this for MOME as well, but we have more information to track. Here is what I propose.

For each objective, make a separate file that stores the max fitness for each bin in the archive. Like all logs, the first column is the pseudo generation, and then have one column for every label in the bin labels (these correspond to the bin coordinates). For each bin that is not in the map, print an X (or whatever MAP Elites is using), and otherwise print the maximum fitness in that objective for that one bin across all of its members. This gives us a file with the same structure as the logs produced by MAP Elites, so we can use all of the existing log techniques to visualize the data ... we just need to point the plot commands to these new files.

This idea can generalize to several other things that we can plot. One other obvious thing to consider is the minimum fitness for each objective, and I think it could also be interesting to track the difference between the min and max fitness as a logged entity as well, since this gives a sense for what the spread of values in the Pareto front is (values of 0 would indicate Pareto fronts dominated by individual elites).

We could also have a log that tracked the hypervolume for each individual bin's Pareto fronts.

By the way, you can calculate hypervolume using jmetal.qualityIndicator.Hypervolume. However, I believe the only other example of how to use this that exists in the code reads the data from a text file rather than directly from an array or list of scores, so a little bit of data manipulation is needed. However, this code is not well documented (it came from some other research project, and honestly I haven't used it much). You will definitely need to use the calculateHypervolume method. This assumes you have a 2D array of scores that already represents scores from a Pareto front. However, the more I look at this, I'm realizing that we need a whole separate issue for making it easy to calculate a hypervolume, so I'll make that and refer to it here #917.

schrum2 commented 1 year ago

Order of output to main log: generation, number of occupied bins, number of individuals in archive (across all bins), max objective 1, ... , max objective n, min objective 1, ... , min objective n, hypervolume

schrum2 commented 1 year ago

Order for separate archive logs. There will be a completely separate log file for each objective:

generation, max bin 1, max bin 2, ... , max bin last, min bin 1, min bin 2, ... , min bin last,

JoannaBlatt commented 1 year ago

currently logs: population of bins max and min of each objective max and min gnu plots archive log archive gnu

next: logging shapes issue it's not saving shapes in the archive

schrum2 commented 1 year ago

In setUpLogging under fullName there is a comment:

// TODO: The .plt file for the log of general archive information will be very different than the numerous MOMELogs below

Please address this. To get a sense for how this should be plotted, look at the setUpLogging method of MAP Elites, starting at the ps = new PrintStream(fillPlot);. This one plot file is responsible for creating multiple plots. Specifically, these are line plots where the x-axis is the pseudo generation and the y-axis is a score or scores. Each separate plot defines its own title, output file, and plot command. You already have a log file containing information that can be plotted, and it seems to currently contain:

Number of Occupied Bins
Total Individuals in Archive
max in each objective
min in each objective

Start with this existing data, and create one plot for the occupied bins, one for the total individuals, and then one plot for each objective: each of these plots will contain both the min and the max (they are on the same scale).

Additional things that should be logged and eventually plotted include:

Overall hypervolume of archive and Max hypervolume across individual bins
Percentage of bins filled
Number of discarded individuals
Some variants of QD score, but I need to think a bit about what QD score means in a multiobjective context first

schrum2 commented 1 year ago

Seems complete. Any further work on MOME can occur in more targeted issues.

schrum2 / MM-NEAT

Implement MOME #891