Open SS-LAN opened 3 years ago
Hello,
You are probably indicating issue #7 as reference. You can follow this comment to further understand how to use your own dataset with the library.
Regarding your query, I would need a few more details before I can help:
Symmetric uncertainty defined as Formula = SU(X,Y)=(2*MI(X,Y))/(H(X)+H(Y))
Symmetric uncertainty - Maximise relevance:: for selecting only relevant features into a sListand second objective function refers to removing redundant features. See the above flowchart. Both the equation are correlated - so as soon as i recieve the results of first objective eg SU values ranked in best to worst order, which will then move into second objective of removing redundanct features in preparing feature subset. But i am not sure how i can let the code refer to chromosome genetic code eg 1 to select and 0 to discard attribute and then moving those selected attribute into my objective functions.. At the moment i tested the objective function individually and they perform what i want separatly as not a part of nsga-2. But i want the subset being generated based on chromosome from nsga-2...Is there a way i can achieve this.
Dataset: Book1.zip
So just to be clear my objective will take in subset generated based on chromsome and then evaluate them. But i am not sure how to let java code select subset based on chromosome genetic code index
Hello @SS-LAN ,
Sorry for being a bit late to reply regarding your problem, was caught up with my own work. Anyway,
From what I understand regarding your problem, you can map your dataset to the binary encoded chromosomes in the following way:
AbstractObjectiveFunction
.getValue(Chromosome)
method of the objective function, so you could set it up during initialization as you wish.// implementing your objective function
public class MyObjectiveFunctionOne extends AbstractObjectiveFunction {
private final double[][] dataset;
// you set your dataset here during initialization
public MyObjectiveFunctionOne(double[][] dataset) {
this.dataset = dataset;
}
@Override
public double getValue(Chromosome chromosome) {
// you have your chromosome and your dataset in your objective function now, to access
// do as required and return a double value
}
}
You can implement as many objective functions as you want to use your dataset.
// make a 2D array of your dataset in your code
double[][] dataset = create2DarrayOfDataset();
// create instances of your objective functions passing the 2D dataset array
MyObjectiveFunctionOne ob1 = new MyObjectiveFunctionOne(dataset);
MyObjectiveFunctionTwo ob2 = new MyObjectiveFunctionTwo(dataset);
...
List<AbstractObjectiveFunction> objectives = new ArrayList<>();
// adding your custom objective
// you can add as many objectives as you want
objectives.add(new MyAwesomeObjectiveOne());
objectives.add(new MyAwesomeObjectiveTwo());
...
// creating your configuration with the new objectives
Configuration configuration = new Configuration(objectives);
NSGA2 nsga2 = new NSGA2(configuration);
//run() returns the final child population or the pareto front
Population paretoFront = nsga2.run();
Once you have access to your dataset in your objective functions, you can then use `Chromosome#getGeneticCode() to get your genetic code. Compare this to your 2D dataset array and select row / column indices where the genetic code is "true" or 1.
I hope this helps.
HI, @onclave. I previously read the query that you commented on for someone recently regarding generating population of n chromosome and using the genetic code for each i.e allele 1 or 0 to use as reference to dataset columns (attributes) to either select them or discrad them for feature selection.
My project is focusing on NSGA-2 to perform feature selection aswell and I am using symmetric uncertainty as objective function to calculate the fitness of each column from dataset based on the chromsome generated as solution. My objective function is a maximisation function that looks to maximise relevance and also another function to reduce redundancy. I have created a popluatiion out of my dataset that i am using. However, the issue i am facing currently is that i am not able to integrate my SU function with your NSGA-2 package objective provider in generating binary encoded chromosomes (Default binary encoder provided in ur library) and using the chromosome index to refer to my dataset in order to produce subset of the dataset based on chromosome solution and evalute using objective function. I am using 2d datase with n columns and m rows.
Objective function details (SU): Symmetric uncertainty can be used to calculate the fitness of features for feature selection by calculating between feature and the target class. The feature which has high value of SU gets high importance. Symmetric uncertainty defined as Formula = SU(X,Y)=(2*MI(X,Y))/(H(X)+H(Y))
So, basically i am trying to achieve a filter based solution-using multi-objective algorithm (NSGA-2) to my problem - where i will test the generated subset on a classifier.
I had a look at the comment you made on someones issue and guidance you provided in documentation section but i am not sure on how to go about doing that. So, can you provide a pseudocode kind off explaination or step i can follow to acheive this?
Thanks - A Dj