Open matthieu-vergne opened 4 years ago
As a side note, all of that is what I planned to do iteratively, with successive deprecations. It would have been long, but progressive. So we can actually do it that way in jMetal 6 but more quickly, removing directly what we don't need anymore instead of deprecating.
I agree that the current Solution
interface has become a bit cumbersome and it contains many redundant methods. However, I was thinking something like this:
public interface Solution<T> extends Serializable {
List<T> variables() ;
double[] objectives() ;
double[] constraints() ;
Map<Object, Object> attributes();
}
Why use arrays?. I commented you that some components in jMetal can be generalized in the sense of not having to depend on jMetal classes. For example, a method that is intensively used in many algorithms is the dominanceTest()
, which now is:
private int dominanceTest(S solution1, S solution2) {
int bestIsOne = 0;
int bestIsTwo = 0;
int result;
for (int i = 0; i < solution1.getNumberOfObjectives(); i++) {
double value1 = solution1.getObjective(i);
double value2 = solution2.getObjective(i);
if (value1 != value2) {
if (value1 < value2) {
bestIsOne = 1;
}
if (value2 < value1) {
bestIsTwo = 1;
}
}
}
if (bestIsOne > bestIsTwo) {
result = -1;
} else if (bestIsTwo > bestIsOne) {
result = 1;
} else {
result = 0;
}
return result;
}
}
However, it can be implemented in this way:
public static int dominanceTest(double[] vector1, double[] vector2) {
int bestIsOne = 0;
int bestIsTwo = 0;
int result;
for (int i = 0; i < vector1.length; i++) {
double value1 = vector1[i];
double value2 = vector2[i];
if (value1 != value2) {
if (value1 < value2) {
bestIsOne = 1;
}
if (value2 < value1) {
bestIsTwo = 1;
}
}
}
result = Integer.compare(bestIsTwo, bestIsOne);
return result;
}
Another example are the quality indicators. Instead of having an evaluate()
method like this:
@Override public Double evaluate(List<S> solutionList) {...}
it can be replaced by:
@Override public double compute(double[][] front) {...}
I also want to get rid off the jmetal.util.front
package, as a front is basically a double[][]
matrix. So, a method like this:
@Test
public void shouldExecuteReturnTheCorrectValueCaseA() {
int numberOfPoints = 3 ;
int numberOfDimensions = 2 ;
Front frontApproximation = new ArrayFront(numberOfPoints, numberOfDimensions);
Front referenceFront = new ArrayFront(numberOfPoints, numberOfDimensions);
Point point1 = new ArrayPoint(numberOfDimensions) ;
point1.setValue(0, 1.5);
point1.setValue(1, 4.0);
Point point2 = new ArrayPoint(numberOfDimensions) ;
point2.setValue(0, 2.0);
point2.setValue(1, 3.0);
Point point3 = new ArrayPoint(numberOfDimensions) ;
point3.setValue(0, 3.0);
point3.setValue(1, 2.0);
frontApproximation.setPoint(0, point1);
frontApproximation.setPoint(1, point2);
frontApproximation.setPoint(2, point3);
Point point4 = new ArrayPoint(numberOfDimensions) ;
point4.setValue(0, 1.0);
point4.setValue(1, 3.0);
Point point5 = new ArrayPoint(numberOfDimensions) ;
point5.setValue(0, 1.5);
point5.setValue(1, 2.0);
Point point6 = new ArrayPoint(numberOfDimensions) ;
point6.setValue(0, 2.0);
point6.setValue(1, 1.5);
referenceFront.setPoint(0, point4);
referenceFront.setPoint(1, point5);
referenceFront.setPoint(2, point6);
QualityIndicator<List<PointSolution>, Double> epsilon =
new Epsilon<PointSolution>(referenceFront) ;
List<PointSolution> front = FrontUtils.convertFrontToSolutionList(frontApproximation) ;
assertEquals(1.0, epsilon.evaluate(front), EPSILON);
}
becomes:
@Test
public void shouldComputeReturnTheCorrectValueCaseA() {
double[][] front = {{1.5, 4.0}, {2.0, 3.0}, {3.0, 2.0}};
double[][] referenceFront = {{1.0, 3.0}, {1.5, 2.0}, {2.0, 1.5}};
Assertions.assertEquals(1.0, new Epsilon(referenceFront).compute(front), EPSILON);
}
And like constraints, if you need to add boundaries, just create the dedicated property rather than an interface extension which requires its own implementations:
If you paid attention, you may have seen that I also removed the copy method from the interface.
I have to think about these ideas.
Thinking in the Solution
interface, I agree that
public interface Solution<T> extends Serializable {
List<T> variables() ;
List<Double> objectives() ;
List<Double> constraints() ;
Map<Object, Object> attributes();
}
is a more consistent definition than using arrays, but I'm concerned about making continuous conversions from lists to arrays. In a run of a metaheuristic, that conversion may need to be applied millions of times.
Generalizing operators to not depend on solutions is a good thing. But that is only about consuming the data directly instead of requesting them from the solution. It does not imply anything on the type of data itself.
For instance, passing from dominanceTest(solution1, solution2)
to dominanceTest(vector1, vector2)
appears to me to be a nice move. But these vectors can still be lists. There is only 4 lines which change:
public static int dominanceTest(List<Double> vector1, List<Double> vector2) { // here
int bestIsOne = 0;
int bestIsTwo = 0;
int result;
for (int i = 0; i < vector1.size(); i++) { // here
double value1 = vector1.get(i); // here
double value2 = vector2.get(i); // here
if (value1 != value2) {
if (value1 < value2) {
bestIsOne = 1;
}
if (value2 < value1) {
bestIsTwo = 1;
}
}
}
result = Integer.compare(bestIsTwo, bestIsOne);
return result;
}
What matters here is that lists are abstractions of arrays. By using arrays, anyone who want to use something else (e.g. LinkedList) must do the conversion. If they want to use arrays, they can just use the ArrayList implementation. If you use arrays, you give no choice at all, either you have arrays everywhere, or there is some conversions going ahead.
Lists do not forbid you to use arrays, it only means that you should use ArrayList if you want to use arrays. Not the primitive arrays.
Now, if you are really concerned about arrays vs. lists, there is a good intermediary, which are iterators. Just use iterators internaly ang offer two methods which can support both types:
public static int dominanceTest(List<Double> vector1, List<Double> vector2) {
Iterator<Double> iterator1 = vector1.iterator();
Iterator<Double> iterator2 = vector2.iterator();
return dominanceTest(iterator1, iterator2);
}
public static int dominanceTest(double[] vector1, double[] vector2) {
Iterator<Double> iterator1 = Arrays.stream(vector1).iterator();
Iterator<Double> iterator2 = Arrays.stream(vector2).iterator();
return dominanceTest(iterator1, iterator2);
}
private static int dominanceTest(Iterator<Double> iterator1, Iterator<Double> iterator2) {
int bestIsOne = 0;
int bestIsTwo = 0;
int result;
for (int i = 0; iterator1.hasNext(); i++) {
double value1 = iterator1.next();
double value2 = iterator2.next();
if (value1 != value2) {
if (value1 < value2) {
bestIsOne = 1;
}
if (value2 < value1) {
bestIsTwo = 1;
}
}
}
result = Integer.compare(bestIsTwo, bestIsOne);
return result;
}
Et voilà ! No particular conversion, since the iterator of the list just browse the list iteratively, and the stream over the array does the same. Then you can further optimize your algorithm to stop when you reach the end or both variables are at one. Meanwhile, having a int at 0 or 1 means having a boolean.
So, as a general policy, I would recommend to push for abstractions like List instead of using primitives like arrays to increase flexibility. But surely any implementation for one can be replaced by the other with this kind of trick. Here an iterator is OK because we just do one pass. In other cases it may need a different trick. But what you can do with one, you can do with the other.
After thinking about having this jMetal6 project, the situation is that now there are three jMetal projects:
The fact is many users still use jMetal 4, and I know that people finding jMetal for the first time frequently get that version. Having three different jMetal projects doesn't seem a good idea.
So, my idea is to reorganize the contents of the jMetal 6 repository to have all the contents of the 5.10 release, and move all the stuff related to the jMetal 6 in a sub-project called jmetal-experimental
, which now contains the former jmetal-auto
and the classes I have developed around a component-based template for evolutionary algorithms. This way, I avoid having jmetal5version
packages in the other sub-projects. When I finish this task, I plan to remove this project.
There is the matter of improving the project. I like the idea of rethinking everything and studying the adoption of alternatives as the Solution
interface proposed in this issue. However, adopting such an interface would require refactoring the whole project, which is something I would like to avoid. Instead, I would prefer to start a new project from scratch, with a name different to "jMetalsomething", where we could try any ideas without any constraint; in fact, I created a repository two years ago with that idea in mind: https://github.com/jMetal/moore.
We can do it iteratively, through incremental deprecations. This is what I started in my generalization branch. This is the best way I know for keeping everything and going iteratively without breaking anything, but it requires some discipline. Basically, you deprecate everything you plan to remove in the next major version, and you actually remove them only when you change the version. For each of your release, you anyway have a tag, so if someone needs a fix related to a specific version, just create a branch on that tag add a commit to fix it. This way, you don't impact other stuff.
Going with release branches is the way to go if you want to go iteratively. If you use maven sub-modules, you clearly cut the two, and with modularization you must pay attention not to have the same packages, otherwise you are going through trouble.
You can use the submodule to try stuff, and then integrate just some pieces in the main submodule once you like it, refactoring just what you need to integrate it.
I agree, but introducing changes such as your proposed Solution
interface is too disruptive (it affects problems, operators, algorithms) and we should have to refactor everything. I have had to cope with this kind of global refactoring many times, and I'm not willing to do it again.
From my point of view, I would prefer to start another project from scratch, where we could apply creative ideas, taking advantage of the full features of Java 11, without affecting the current project. Of course, we can dispose of all the stuff we already have in jMetal, so we could reuse everything we need.
Check out the interface I use in the generalization branch. Nothing disruptive here: https://github.com/jMetal/jMetal/pull/406/commits/1a81cb0b92ff49a30a7d7fcc3bcc0eb137d54edb
All I do is to add the methods to use abstractions, which build on the methods which don't. This way, nothing is broken, and we can progressively start to use them. With that, I add a generic implementation: https://github.com/jMetal/jMetal/pull/406/commits/deca799356c8a50408ecf32dfe2b9b6d9e3c4ff0
It overrides these methods to focus first on objects rather than arrays. But both are still here, which is what makes it easy to iteratively transform the rest.
Similarly, I didn't erased the DoubleSolution interface, which provides the double boundaries. But I created specific objects which reuse them if we do use a DoubleSolution: https://github.com/jMetal/jMetal/pull/406/commits/ed79bba42a9bf11a83aa347eb9a1072e870b3372 https://github.com/jMetal/jMetal/pull/406/commits/1215981aaaa534fd37aa7b4b16c21ed37ee6bbac
This allows to start using these abstractions to not depend on the specificities of the DoubleSolution interface, while all the other algorithms remain how they are. This allows to progressively refactor. And once they are not used anymore, just get rid of them.
Now, feel free to create a submodule to start from scratch. But keep in mind that I can help in integrating stuff in the main submodule, thus keeping the new submodule an experimental scope. Because if you don't want to take care of the refactoring tasks, I am willing to. Since it is the kind of task I am used to and take some pride in doing. I do a lot of refactoring in my personal projects and in my job too. So just ask for it when you would like something integrated, and I can come up with suggestions. And if you don't like it, just continue with your experimental submodule.
If I had to do it iteratively, here is the plan I would follow:
XxxSolution
extensions to the unique Solution<Xxx>
The first step is to simplify the types hierarchy by going back to the generic interface Solution
.Thus we need to We can rely on its attribute to store additional data:
Property
or alike) to read/write additional values from/in solutions : https://github.com/jMetal/jMetal/commit/ed79bba42a9bf11a83aa347eb9a1072e870b3372Solution
, like I did here for DoubleSolution
: https://github.com/jMetal/jMetal/commit/1215981aaaa534fd37aa7b4b16c21ed37ee6bbacXxxSolution
by the more generic Solution<Xxx>
At this point, we don't need the extensions anymore, but we are still generating solutions from the solutions themselves (copy) and from problems (createSolution), so they extensions are still in use. We must first allow their replacement:
Supplier
objects, with the current components (random, problem) used as default. Again, I will favor factory methods over constructors.UnaryOperator<Solution<T>>
, maybe making our on interface for readability, like SolutionCopier<T>.copy(solution)
. We can use Solution.copy()
as default again.Solution
implementationNow that all components can be extensively customized on their solution instanciations, let's exploit it:
Solution
, like here but still with the primitive arrays, abstraction is for later: https://github.com/jMetal/jMetal/commit/deca799356c8a50408ecf32dfe2b9b6d9e3c4ff0Solution
extensions and their implementations. Since they are only used in deprecated factory methods & constructors, no additional migration work should be necessary here.Solution.copy()
, since they are replaced by solution factory methods.At this point, we primarily rely on the generic Solution
implementation but it still uses arrays, so let's change that:
Solution
interface to provide abstractions, like here: https://github.com/jMetal/jMetal/commit/1a81cb0b92ff49a30a7d7fcc3bcc0eb137d54edbSolution
optimized for abstractions, like here: https://github.com/jMetal/jMetal/commit/deca799356c8a50408ecf32dfe2b9b6d9e3c4ff0Solution
extensions will have an additional conversion, although it should remain small if we smartly use streams/iterators where relevant.Solution
, since they have been replaced by the abstracted ones.Solution
usesSince get/set specific variables can be done through the already available list, no need for the additional getter and setter. Same thing for the others. So:
Solution.variables()
to return the list of variablesSolution.getVariable()
by variables().get()
Solution.setVariable()
by variables().set()
Solution.getNumberOfVariables()
by variables().size()
Solution.objectives()
to return the list of objectivesSolution.getObjective()
by objectives().get()
Solution.setObjective()
by objectives().set()
Solution.getNumberOfObjectives()
by objectives().size()
Solution.constraints()
to return the list of constraintsSolution.getConstraint()
by constraints().get()
Solution.setConstraint()
by constraints().set()
Solution.getNumberOfConstraints()
by constraints().size()
Solution.attributes()
to return the map of attributesSolution.getAttribute()
by attributes().get()
Solution.setAttribute()
by attributes().set()
Solution.hasAttribute()
by attributes().containsKey()
At this point, those using the deprecated constructors which still rely on the old Solution
extensions will be enforced to use the abstraction, so having a systematic additional conversion for objectives and constraints. However, this is the last run before the new major version, so it is really time to migrate now from this deprecated stuff.
If not done yet, it would be wise to plan a new major version here. By releasing the current state as the last minor version for the current major, people would just have to update to it to get all the deprecation warnings and go through them for proper migration (each deprecated element is documented with the @deprecated
javadoc tag to tell how to migrate).
By removing all the deprecated stuff and releasing as the next major version, those having migrated can just replace by the new major version with no problem.
Steps like creating a class are atomic, so they are done in one shot. Steps like replacing methods just go progressively, searching for current uses with Eclipse and replacing them one after another. They can be done all at once in a single commit, or just changing few of them at a time. This is what allows a progressive refactoring. And since we deprecate without removing anything, we never break existing code. Only the major version breaks, but only what has been already deprecated and provided in a previous version for users to migrate safely.
Worth noting, after the major version is released, all the deprecated stuff is gone. Especially the annoying sets of constructors plugged to Solution
extensions. So it opens the possibility to generalize all the relevant components to deal with more types (for now, we have only replaced extensions by their generic counter part, but never generalized further).
I have just updated https://github.com/jMetal/jMetal with the contents of the now defunct jMetal 6 project.
As I suppose that we cannot copy or move this issue to the other project, we could open a new one there with the contents of this post: https://github.com/jMetal/jMetal6/issues/3#issuecomment-662687720.
Then, I will delete this project.
Since we are on the "rethinking everything" side, may I suggest to push for a proper generic and simple solution interface?
This is what I started to do in my generalization branch for jMetal 5: https://github.com/jMetal/jMetal/pull/406/commits
Observations
Here are the facts:
List
, so we don't need to add get and set methods to interact with each variable, since theList
already provides them.List
too (which I did here, in a way which does not break the legacy code).Function<S, T>
). If it needs to write to it, it can require it too (BiConsumer<S, T>
). I made something a bit more integrated here.Suggestion
So basically, without loosing any functionality, the solution interface can be reduced to that:
solution.getVariable(index)
solution.variables().get(index)
solution.setVariable(index, value)
solution.variables().set(index, value)
solution.getNumberOfVariables()
solution.variables().size()
solution.getObjective(index)
solution.objectives().get(index)
solution.setObjective(index, value)
solution.objectives().set(index, value)
solution.getNumberOfObjectives()
solution.objectives().size()
solution.getAttribute(id)
solution.attributes().get(id)
solution.setAttribute(id, value)
solution.attributes().put(id, value)
solution.hasAttribute(id)
solution.attributes().containsKey(id)
solution.getConstraints()
constraintsProperty.read(solution)
solution.getConstraint(index)
constraintsProperty.read(solution).get(index)
solution.setConstraint(index, value)
constraintsProperty.read(solution).set(index, value)
solution.getNumberOfConstraints()
constraintsProperty.read(solution).size()
solution.getLowerBound(index)
lowerBounds.read(solution).get(index)
solution.setLowerBound(index, value)
lowerBounds.read(solution).set(index, value)
solution.getUpperBound(index)
upperBounds.read(solution).get(index)
solution.setUpperBound(index, value)
upperBounds.read(solution).set(index, value)
In the case of constraints and boundaries, they all apply to specific variables, so you have to deal with a list of values the same size as the variables list. Which means that for all these properties, you need a single class:
Actually, since we only use
read
tog et the list and then read or write it, we can actually replace thewrite
by an automatic generation. Or even combine the two strategies to provide more flexibility.What about
copy
?If you paid attention, you may have seen that I also removed the
copy
method from the interface. So aren't we loosing something here? Actually, it becomes useless, and here is why.We have no more extensions of the interface, since all extensions are dealt with properties. Which means that all the algorithms consume and produce
Solution
instances. Not even aS extends Solution
, but directlySolution
itself. No need to support extensions at all.Since all implementations are fine, then the algorithm can just pick the implementation it needs. Just optimize the choice based on the algorithm itself. All you need then is for this implementation to have a factory method to copy:
If you prefer to keep the possibility to customize the copying process, then rather than having the algorithm decide about it, make it an argument of the algorithm:
If the user wants to use a specific factory method, he just have to give it as a lambda: