jMetal / jMetal6

MIT License
1 stars 3 forks source link

Rethinking solutions #3

Open matthieu-vergne opened 4 years ago

matthieu-vergne commented 4 years ago

Since we are on the "rethinking everything" side, may I suggest to push for a proper generic and simple solution interface?

This is what I started to do in my generalization branch for jMetal 5: https://github.com/jMetal/jMetal/pull/406/commits

Observations

Here are the facts:

  1. Variables are already a List, so we don't need to add get and set methods to interact with each variable, since the List already provides them.
  2. Objectives are stuck with arrays, which are poorly featured (especially in terms of behavior customization), so we should make them proper List too (which I did here, in a way which does not break the legacy code).
  3. Any additional value, like bounds and other things, are required only by some solutions, so they should not be part of the interface to not bother those which don't need it (and don't take more space they need). I tend to think that constraints are the same: some have constraints, others not, so don't impose constraints on all of them.
  4. The attributes map is the best way to make dynamic extensions of the solution: just design a "property" or "metadata" object which represents an additional property of the solution (constraint, bound, etc.). If this property is specific to the solution, it can use the attributes map to store and retrieve it. If it is a constant, it just returns it (if all your solutions have the same constraints, don't make thousands of them store the same value). If an algorithm requires specific properties, it just ask for them (Function<S, T>). If it needs to write to it, it can require it too (BiConsumer<S, T>). I made something a bit more integrated here.

Suggestion

So basically, without loosing any functionality, the solution interface can be reduced to that:

public interface Solution<T> extends Serializable {
  List<T> variables() ;
  List<Double> objectives() ;
  Map<Object, Object> attributes();
}
And here are the removed methods and their new forms: Legacy New
solution.getVariable(index) solution.variables().get(index)
solution.setVariable(index, value) solution.variables().set(index, value)
solution.getNumberOfVariables() solution.variables().size()
solution.getObjective(index) solution.objectives().get(index)
solution.setObjective(index, value) solution.objectives().set(index, value)
solution.getNumberOfObjectives() solution.objectives().size()
solution.getAttribute(id) solution.attributes().get(id)
solution.setAttribute(id, value) solution.attributes().put(id, value)
solution.hasAttribute(id) solution.attributes().containsKey(id)
solution.getConstraints() constraintsProperty.read(solution)
solution.getConstraint(index) constraintsProperty.read(solution).get(index)
solution.setConstraint(index, value) constraintsProperty.read(solution).set(index, value)
solution.getNumberOfConstraints() constraintsProperty.read(solution).size()
And like constraints, if you need to add boundaries, just create the dedicated property rather than an interface extension which requires its own implementations: Legacy New
solution.getLowerBound(index) lowerBounds.read(solution).get(index)
solution.setLowerBound(index, value) lowerBounds.read(solution).set(index, value)
solution.getUpperBound(index) upperBounds.read(solution).get(index)
solution.setUpperBound(index, value) upperBounds.read(solution).set(index, value)

In the case of constraints and boundaries, they all apply to specific variables, so you have to deal with a list of values the same size as the variables list. Which means that for all these properties, you need a single class:

public class VariablesSizedProperty<T> {
  void write(Solution<?> s, List<T> values) {
    // do some size checks here, as much as variables
    s.attributes().put(this, values);
  }
  List<T> read(Solution<?> s) {
    return (List<T>) s.attributes().get(this);
  }
}

Actually, since we only use read tog et the list and then read or write it, we can actually replace the write by an automatic generation. Or even combine the two strategies to provide more flexibility.

What about copy?

If you paid attention, you may have seen that I also removed the copy method from the interface. So aren't we loosing something here? Actually, it becomes useless, and here is why.

We have no more extensions of the interface, since all extensions are dealt with properties. Which means that all the algorithms consume and produce Solution instances. Not even a S extends Solution, but directly Solution itself. No need to support extensions at all.

Since all implementations are fine, then the algorithm can just pick the implementation it needs. Just optimize the choice based on the algorithm itself. All you need then is for this implementation to have a factory method to copy:

static <T> Solution<T> createCopy(Solution<T> original) {
  // Create the instance based on the original
  // Whatever the original class was
}

If you prefer to keep the possibility to customize the copying process, then rather than having the algorithm decide about it, make it an argument of the algorithm:

UnaryOperator<Solution<T>> copier = // copier given in argument
// ...
Solution<T> copy = copier.apply(solution);

If the user wants to use a specific factory method, he just have to give it as a lambda:

UnaryOperator<Solution<T>> copier = AwesomeSolution::createCopy;
algorithm = new MyAlgorithm(..., copier, ...);
matthieu-vergne commented 4 years ago

As a side note, all of that is what I planned to do iteratively, with successive deprecations. It would have been long, but progressive. So we can actually do it that way in jMetal 6 but more quickly, removing directly what we don't need anymore instead of deprecating.

ajnebro commented 4 years ago

I agree that the current Solution interface has become a bit cumbersome and it contains many redundant methods. However, I was thinking something like this:

public interface Solution<T> extends Serializable {
  List<T> variables() ;
  double[] objectives() ;
  double[] constraints() ;
  Map<Object, Object> attributes();
} 

Why use arrays?. I commented you that some components in jMetal can be generalized in the sense of not having to depend on jMetal classes. For example, a method that is intensively used in many algorithms is the dominanceTest(), which now is:

  private int dominanceTest(S solution1, S solution2) {
    int bestIsOne = 0;
    int bestIsTwo = 0;
    int result;
    for (int i = 0; i < solution1.getNumberOfObjectives(); i++) {
      double value1 = solution1.getObjective(i);
      double value2 = solution2.getObjective(i);
      if (value1 != value2) {
        if (value1 < value2) {
          bestIsOne = 1;
        }
        if (value2 < value1) {
          bestIsTwo = 1;
        }
      }
    }
    if (bestIsOne > bestIsTwo) {
      result = -1;
    } else if (bestIsTwo > bestIsOne) {
      result = 1;
    } else {
      result = 0;
    }
    return result;
  }
} 

However, it can be implemented in this way:

  public static int dominanceTest(double[] vector1, double[] vector2) {
    int bestIsOne = 0;
    int bestIsTwo = 0;
    int result;
    for (int i = 0; i < vector1.length; i++) {
      double value1 = vector1[i];
      double value2 = vector2[i];
      if (value1 != value2) {
        if (value1 < value2) {
          bestIsOne = 1;
        }
        if (value2 < value1) {
          bestIsTwo = 1;
        }
      }
    }
    result = Integer.compare(bestIsTwo, bestIsOne);
    return result;
  }

Another example are the quality indicators. Instead of having an evaluate() method like this:

  @Override public Double evaluate(List<S> solutionList) {...}

it can be replaced by:

  @Override public double compute(double[][] front) {...}
ajnebro commented 4 years ago

I also want to get rid off the jmetal.util.front package, as a front is basically a double[][] matrix. So, a method like this:

  @Test
  public void shouldExecuteReturnTheCorrectValueCaseA() {
    int numberOfPoints = 3 ;
    int numberOfDimensions = 2 ;
    Front frontApproximation = new ArrayFront(numberOfPoints, numberOfDimensions);
    Front referenceFront = new ArrayFront(numberOfPoints, numberOfDimensions);

    Point point1 = new ArrayPoint(numberOfDimensions) ;
    point1.setValue(0, 1.5);
    point1.setValue(1, 4.0);
    Point point2 = new ArrayPoint(numberOfDimensions) ;
    point2.setValue(0, 2.0);
    point2.setValue(1, 3.0);
    Point point3 = new ArrayPoint(numberOfDimensions) ;
    point3.setValue(0, 3.0);
    point3.setValue(1, 2.0);

    frontApproximation.setPoint(0, point1);
    frontApproximation.setPoint(1, point2);
    frontApproximation.setPoint(2, point3);

    Point point4 = new ArrayPoint(numberOfDimensions) ;
    point4.setValue(0, 1.0);
    point4.setValue(1, 3.0);
    Point point5 = new ArrayPoint(numberOfDimensions) ;
    point5.setValue(0, 1.5);
    point5.setValue(1, 2.0);
    Point point6 = new ArrayPoint(numberOfDimensions) ;
    point6.setValue(0, 2.0);
    point6.setValue(1, 1.5);

    referenceFront.setPoint(0, point4);
    referenceFront.setPoint(1, point5);
    referenceFront.setPoint(2, point6);

    QualityIndicator<List<PointSolution>, Double> epsilon =
        new Epsilon<PointSolution>(referenceFront) ;

    List<PointSolution> front = FrontUtils.convertFrontToSolutionList(frontApproximation) ;
    assertEquals(1.0, epsilon.evaluate(front), EPSILON);
  }

becomes:

  @Test
  public void shouldComputeReturnTheCorrectValueCaseA() {
    double[][] front = {{1.5, 4.0}, {2.0, 3.0}, {3.0, 2.0}};
    double[][] referenceFront = {{1.0, 3.0}, {1.5, 2.0}, {2.0, 1.5}};

    Assertions.assertEquals(1.0, new Epsilon(referenceFront).compute(front), EPSILON);
  }
ajnebro commented 4 years ago

And like constraints, if you need to add boundaries, just create the dedicated property rather than an interface extension which requires its own implementations:

If you paid attention, you may have seen that I also removed the copy method from the interface.

I have to think about these ideas.

ajnebro commented 4 years ago

Thinking in the Solution interface, I agree that

public interface Solution<T> extends Serializable {
  List<T> variables() ;
  List<Double> objectives() ;
  List<Double> constraints() ;
  Map<Object, Object> attributes();
} 

is a more consistent definition than using arrays, but I'm concerned about making continuous conversions from lists to arrays. In a run of a metaheuristic, that conversion may need to be applied millions of times.

matthieu-vergne commented 4 years ago

Generalizing operators to not depend on solutions is a good thing. But that is only about consuming the data directly instead of requesting them from the solution. It does not imply anything on the type of data itself.

For instance, passing from dominanceTest(solution1, solution2) to dominanceTest(vector1, vector2) appears to me to be a nice move. But these vectors can still be lists. There is only 4 lines which change:

public static int dominanceTest(List<Double> vector1, List<Double> vector2) { // here
  int bestIsOne = 0;
  int bestIsTwo = 0;
  int result;
  for (int i = 0; i < vector1.size(); i++) { // here
    double value1 = vector1.get(i); // here
    double value2 = vector2.get(i); // here
    if (value1 != value2) {
      if (value1 < value2) {
        bestIsOne = 1;
      }
      if (value2 < value1) {
        bestIsTwo = 1;
      }
    }
  }
  result = Integer.compare(bestIsTwo, bestIsOne);
  return result;
}

What matters here is that lists are abstractions of arrays. By using arrays, anyone who want to use something else (e.g. LinkedList) must do the conversion. If they want to use arrays, they can just use the ArrayList implementation. If you use arrays, you give no choice at all, either you have arrays everywhere, or there is some conversions going ahead.

Lists do not forbid you to use arrays, it only means that you should use ArrayList if you want to use arrays. Not the primitive arrays.

Now, if you are really concerned about arrays vs. lists, there is a good intermediary, which are iterators. Just use iterators internaly ang offer two methods which can support both types:

public static int dominanceTest(List<Double> vector1, List<Double> vector2) {
  Iterator<Double> iterator1 = vector1.iterator();
  Iterator<Double> iterator2 = vector2.iterator();
  return dominanceTest(iterator1, iterator2);
}

public static int dominanceTest(double[] vector1, double[] vector2) {
  Iterator<Double> iterator1 = Arrays.stream(vector1).iterator();
  Iterator<Double> iterator2 = Arrays.stream(vector2).iterator();
  return dominanceTest(iterator1, iterator2);
}

private static int dominanceTest(Iterator<Double> iterator1, Iterator<Double> iterator2) {
  int bestIsOne = 0;
  int bestIsTwo = 0;
  int result;
  for (int i = 0; iterator1.hasNext(); i++) {
    double value1 = iterator1.next();
    double value2 = iterator2.next();
    if (value1 != value2) {
      if (value1 < value2) {
        bestIsOne = 1;
      }
      if (value2 < value1) {
        bestIsTwo = 1;
      }
    }
  }
  result = Integer.compare(bestIsTwo, bestIsOne);
  return result;
}

Et voilà ! No particular conversion, since the iterator of the list just browse the list iteratively, and the stream over the array does the same. Then you can further optimize your algorithm to stop when you reach the end or both variables are at one. Meanwhile, having a int at 0 or 1 means having a boolean.

So, as a general policy, I would recommend to push for abstractions like List instead of using primitives like arrays to increase flexibility. But surely any implementation for one can be replaced by the other with this kind of trick. Here an iterator is OK because we just do one pass. In other cases it may need a different trick. But what you can do with one, you can do with the other.

ajnebro commented 4 years ago

After thinking about having this jMetal6 project, the situation is that now there are three jMetal projects:

The fact is many users still use jMetal 4, and I know that people finding jMetal for the first time frequently get that version. Having three different jMetal projects doesn't seem a good idea.

So, my idea is to reorganize the contents of the jMetal 6 repository to have all the contents of the 5.10 release, and move all the stuff related to the jMetal 6 in a sub-project called jmetal-experimental, which now contains the former jmetal-auto and the classes I have developed around a component-based template for evolutionary algorithms. This way, I avoid having jmetal5version packages in the other sub-projects. When I finish this task, I plan to remove this project.

There is the matter of improving the project. I like the idea of rethinking everything and studying the adoption of alternatives as the Solution interface proposed in this issue. However, adopting such an interface would require refactoring the whole project, which is something I would like to avoid. Instead, I would prefer to start a new project from scratch, with a name different to "jMetalsomething", where we could try any ideas without any constraint; in fact, I created a repository two years ago with that idea in mind: https://github.com/jMetal/moore.

matthieu-vergne commented 4 years ago

We can do it iteratively, through incremental deprecations. This is what I started in my generalization branch. This is the best way I know for keeping everything and going iteratively without breaking anything, but it requires some discipline. Basically, you deprecate everything you plan to remove in the next major version, and you actually remove them only when you change the version. For each of your release, you anyway have a tag, so if someone needs a fix related to a specific version, just create a branch on that tag add a commit to fix it. This way, you don't impact other stuff.

Going with release branches is the way to go if you want to go iteratively. If you use maven sub-modules, you clearly cut the two, and with modularization you must pay attention not to have the same packages, otherwise you are going through trouble.

matthieu-vergne commented 4 years ago

You can use the submodule to try stuff, and then integrate just some pieces in the main submodule once you like it, refactoring just what you need to integrate it.

ajnebro commented 4 years ago

I agree, but introducing changes such as your proposed Solution interface is too disruptive (it affects problems, operators, algorithms) and we should have to refactor everything. I have had to cope with this kind of global refactoring many times, and I'm not willing to do it again.

From my point of view, I would prefer to start another project from scratch, where we could apply creative ideas, taking advantage of the full features of Java 11, without affecting the current project. Of course, we can dispose of all the stuff we already have in jMetal, so we could reuse everything we need.

matthieu-vergne commented 4 years ago

Check out the interface I use in the generalization branch. Nothing disruptive here: https://github.com/jMetal/jMetal/pull/406/commits/1a81cb0b92ff49a30a7d7fcc3bcc0eb137d54edb

All I do is to add the methods to use abstractions, which build on the methods which don't. This way, nothing is broken, and we can progressively start to use them. With that, I add a generic implementation: https://github.com/jMetal/jMetal/pull/406/commits/deca799356c8a50408ecf32dfe2b9b6d9e3c4ff0

It overrides these methods to focus first on objects rather than arrays. But both are still here, which is what makes it easy to iteratively transform the rest.

Similarly, I didn't erased the DoubleSolution interface, which provides the double boundaries. But I created specific objects which reuse them if we do use a DoubleSolution: https://github.com/jMetal/jMetal/pull/406/commits/ed79bba42a9bf11a83aa347eb9a1072e870b3372 https://github.com/jMetal/jMetal/pull/406/commits/1215981aaaa534fd37aa7b4b16c21ed37ee6bbac

This allows to start using these abstractions to not depend on the specificities of the DoubleSolution interface, while all the other algorithms remain how they are. This allows to progressively refactor. And once they are not used anymore, just get rid of them.

Now, feel free to create a submodule to start from scratch. But keep in mind that I can help in integrating stuff in the main submodule, thus keeping the new submodule an experimental scope. Because if you don't want to take care of the refactoring tasks, I am willing to. Since it is the kind of task I am used to and take some pride in doing. I do a lot of refactoring in my personal projects and in my job too. So just ask for it when you would like something integrated, and I can come up with suggestions. And if you don't like it, just continue with your experimental submodule.

matthieu-vergne commented 4 years ago

If I had to do it iteratively, here is the plan I would follow:

Generalyze from the many XxxSolution extensions to the unique Solution<Xxx>

The first step is to simplify the types hierarchy by going back to the generic interface Solution.Thus we need to We can rely on its attribute to store additional data:

Allow the use of custom solution generators

At this point, we don't need the extensions anymore, but we are still generating solutions from the solutions themselves (copy) and from problems (createSolution), so they extensions are still in use. We must first allow their replacement:

Use of a generic Solution implementation

Now that all components can be extensively customized on their solution instanciations, let's exploit it:

Replace primitive arrays by abstractions

At this point, we primarily rely on the generic Solution implementation but it still uses arrays, so let's change that:

Simplify Solution uses

Since get/set specific variables can be done through the already available list, no need for the additional getter and setter. Same thing for the others. So:

At this point, those using the deprecated constructors which still rely on the old Solution extensions will be enforced to use the abstraction, so having a systematic additional conversion for objectives and constraints. However, this is the last run before the new major version, so it is really time to migrate now from this deprecated stuff.

New major version

If not done yet, it would be wise to plan a new major version here. By releasing the current state as the last minor version for the current major, people would just have to update to it to get all the deprecation warnings and go through them for proper migration (each deprecated element is documented with the @deprecated javadoc tag to tell how to migrate).

By removing all the deprecated stuff and releasing as the next major version, those having migrated can just replace by the new major version with no problem.

matthieu-vergne commented 4 years ago

Steps like creating a class are atomic, so they are done in one shot. Steps like replacing methods just go progressively, searching for current uses with Eclipse and replacing them one after another. They can be done all at once in a single commit, or just changing few of them at a time. This is what allows a progressive refactoring. And since we deprecate without removing anything, we never break existing code. Only the major version breaks, but only what has been already deprecated and provided in a previous version for users to migrate safely.

matthieu-vergne commented 4 years ago

Worth noting, after the major version is released, all the deprecated stuff is gone. Especially the annoying sets of constructors plugged to Solution extensions. So it opens the possibility to generalize all the relevant components to deal with more types (for now, we have only replaced extensions by their generic counter part, but never generalized further).

ajnebro commented 4 years ago

I have just updated https://github.com/jMetal/jMetal with the contents of the now defunct jMetal 6 project.

As I suppose that we cannot copy or move this issue to the other project, we could open a new one there with the contents of this post: https://github.com/jMetal/jMetal6/issues/3#issuecomment-662687720.

Then, I will delete this project.

matthieu-vergne commented 4 years ago

https://github.com/jMetal/jMetal/issues/411