onclave / NSGA-II

an implementation of NSGA-II in java
MIT License
46 stars 23 forks source link

Question about adding own objective function #11

Open SS-LAN opened 3 years ago

SS-LAN commented 3 years ago

HI, @onclave. I previously read the query that you commented on for someone recently regarding generating population of n chromosome and using the genetic code for each i.e allele 1 or 0 to use as reference to dataset columns (attributes) to either select them or discrad them for feature selection.

My project is focusing on NSGA-2 to perform feature selection aswell and I am using symmetric uncertainty as objective function to calculate the fitness of each column from dataset based on the chromsome generated as solution. My objective function is a maximisation function that looks to maximise relevance and also another function to reduce redundancy. I have created a popluatiion out of my dataset that i am using. However, the issue i am facing currently is that i am not able to integrate my SU function with your NSGA-2 package objective provider in generating binary encoded chromosomes (Default binary encoder provided in ur library) and using the chromosome index to refer to my dataset in order to produce subset of the dataset based on chromosome solution and evalute using objective function. I am using 2d datase with n columns and m rows.

Objective function details (SU): Symmetric uncertainty can be used to calculate the fitness of features for feature selection by calculating between feature and the target class. The feature which has high value of SU gets high importance. Symmetric uncertainty defined as Formula = SU(X,Y)=(2*MI(X,Y))/(H(X)+H(Y))

So, basically i am trying to achieve a filter based solution-using multi-objective algorithm (NSGA-2) to my problem - where i will test the generated subset on a classifier.

I had a look at the comment you made on someones issue and guidance you provided in documentation section but i am not sure on how to go about doing that. So, can you provide a pseudocode kind off explaination or step i can follow to acheive this?

Thanks - A Dj

onclave commented 3 years ago

Hello,

You are probably indicating issue #7 as reference. You can follow this comment to further understand how to use your own dataset with the library.

Regarding your query, I would need a few more details before I can help:

  1. Provide a short description of your dataset. If possible, provide a small subset o your dataset so that I can take a look at it and understand what you are working with.
  2. What are your objective functions? From your issue, I am guessing you have 2 OFs, one is a maximization problem, while the other is minimization problem?

Symmetric uncertainty defined as Formula = SU(X,Y)=(2*MI(X,Y))/(H(X)+H(Y))

  1. Defile the individual terms of the equation.
SS-LAN commented 3 years ago

image

Symmetric uncertainty - Maximise relevance:: for selecting only relevant features into a sListand second objective function refers to removing redundant features. See the above flowchart. Both the equation are correlated - so as soon as i recieve the results of first objective eg SU values ranked in best to worst order, which will then move into second objective of removing redundanct features in preparing feature subset. But i am not sure how i can let the code refer to chromosome genetic code eg 1 to select and 0 to discard attribute and then moving those selected attribute into my objective functions.. At the moment i tested the objective function individually and they perform what i want separatly as not a part of nsga-2. But i want the subset being generated based on chromosome from nsga-2...Is there a way i can achieve this.

Dataset: Book1.zip

SS-LAN commented 3 years ago

So just to be clear my objective will take in subset generated based on chromsome and then evaluate them. But i am not sure how to let java code select subset based on chromosome genetic code index

onclave commented 3 years ago

Hello @SS-LAN ,

Sorry for being a bit late to reply regarding your problem, was caught up with my own work. Anyway,

From what I understand regarding your problem, you can map your dataset to the binary encoded chromosomes in the following way:

  1. Express your dataset or subset of the dataset that you want to work with as a 2D array. (I would recommend to also express the datasets in terms of OOP, but use whatever is easier / understandable for you).
  2. Implement your own Objective Function classes extending AbstractObjectiveFunction.
  3. While implementing your objective functions, you could pass this 2D representation of your dataset as constructor parameter. My library only calls the getValue(Chromosome) method of the objective function, so you could set it up during initialization as you wish.
// implementing your objective function
public class MyObjectiveFunctionOne extends AbstractObjectiveFunction {

    private final double[][] dataset;

    // you set your dataset here during initialization
    public MyObjectiveFunctionOne(double[][] dataset) {
        this.dataset = dataset;
    }

    @Override
    public double getValue(Chromosome chromosome) {
        // you have your chromosome and your dataset in your objective function now, to access
        // do as required and return a double value
    }
}

You can implement as many objective functions as you want to use your dataset.

  1. Create instances of your objective functions and add them to the library's configuration for the algorithm to use.
// make a 2D array of your dataset in your code
double[][] dataset = create2DarrayOfDataset();

// create instances of your objective functions passing the 2D dataset array
MyObjectiveFunctionOne ob1 = new MyObjectiveFunctionOne(dataset);
MyObjectiveFunctionTwo ob2 = new MyObjectiveFunctionTwo(dataset);
...

List<AbstractObjectiveFunction> objectives = new ArrayList<>();

// adding your custom objective
// you can add as many objectives as you want
objectives.add(new MyAwesomeObjectiveOne());
objectives.add(new MyAwesomeObjectiveTwo());
...

// creating your configuration with the new objectives
Configuration configuration = new Configuration(objectives);
NSGA2 nsga2 = new NSGA2(configuration);

//run() returns the final child population or the pareto front
Population paretoFront = nsga2.run();

Once you have access to your dataset in your objective functions, you can then use `Chromosome#getGeneticCode() to get your genetic code. Compare this to your 2D dataset array and select row / column indices where the genetic code is "true" or 1.

I hope this helps.