jMetal / jMetal

jMetal: a framework for multi-objective optimization with metaheuristics

https://jmetal.readthedocs.io/en/latest/index.html

MIT License

516 stars 403 forks source link

Problem-solution spaghetti code #71

Closed matthieu-vergne closed 2 years ago

matthieu-vergne commented 9 years ago

I am currently looking for fixing the warnings related to missing parameters for generics and I see that you have spaghetti code between problems and solutions implementations.

A solution is something with rather many properties:

public interface Solution<T> extends Serializable {
  public void setObjective(int index, double value) ;
  public double getObjective(int index) ;

  public T getVariableValue(int index) ;
  public void setVariableValue(int index, T value) ;
  public String getVariableValueString(int index) ;

  public int getNumberOfVariables() ;
  public int getNumberOfObjectives() ;

  public double getOverallConstraintViolationDegree() ;
  public void setOverallConstraintViolationDegree(double violationDegree) ;

  public int getNumberOfViolatedConstraints() ;
  public void setNumberOfViolatedConstraints(int numberOfViolatedConstraints) ;

  public Solution<T> copy() ;

  public void setAttribute(Object id, Object value) ;
  public Object getAttribute(Object id) ;
}

Why not. I think there is a lot to remove but let that for later.

The point is that you have provided a generic implementation through AbstractGenericSolution, and here is the issue:

public abstract class AbstractGenericSolution<T, P extends Problem> implements Solution<T> {
  ...
}

Thus, the solution depends on the problem. It could seem natural on an informal basis, but if we look at the definition of a problem:

public interface Problem<S extends Solution> extends Serializable {
  /* Getters */
  public int getNumberOfVariables() ;
  public int getNumberOfObjectives() ;
  public int getNumberOfConstraints() ;
  public String getName() ;

  /* Methods */
  public void evaluate(S solution) ;
  public S createSolution() ;
}

So here you see that a problem also depends on solutions, thus a loop. Loops are hints that responsibilities are not well delegated, because to do its job the first needs the second which, to do its own job, needs the first. As an additional hint, you can see that in both problems and solutions you have the number of variables and objectives, and while you have the number of violated constraints in the solution you don't have the number of constraints, but you find it in the problem. For me, this shows a clear lack of separation of responsibilities, and it should be fixed. Because while I was trying to figure out what happen in the implementations to fix the missing generics, it was just confusing and exploding everywhere and I actually had to change the code itself, but it is explosive.

For me, it makes sense to have all these objectives and variable stuff managed by the problem, because this is where the semantic is. However, because each objective value relate to a specific solution, one should access it through the corresponding solution. But I see here something which could be used in the solution without having this redundancy: you can store all your values as attributes, as the solution already provide this facility. Thus, you can provide not only the number of objectives/variables through the problem, but also how to compute this objectives based on the variables, managing the constraints, etc. From the point of view of the solution, all of that will be stored as attributes which can be retrieved through the problem.

I did not see how you designed in detail all this attribute stuff, I just noticed it from the GECCO paper. But the redundancy should be certainly fixed, and this seems to me a promising way to do so. So all the semantic would be stored in the Problem and the Solution would be nothing more than a customizable storage of attributes:

public interface Solution extends Serializable {
  public Solution copy() ;

  public void setAttribute(Object id, Object value) ;
  public Object getAttribute(Object id) ;
}

This way, all the different solutions would probably disappear and you could centralize all the highly specific treatments into the dedicated problems which need them, rather than having a dedicated problem AND a dedicated solution which are both entangled through spaghetti code.

ajnebro commented 9 years ago

I have to say that the idea of such a simple Solution interface is very attractive to me (it reminds some file systems where everything in a file is an attribute, including the data).

The point is that any solution has for sure a list of variables and list of objective values, and both the numbers of variables and objectives must be offered to the client code. The contraints are only needed in the case of solving constrained problem, so information about them are candidate to be used as attributes.

I also agree with this:

So here you see that a problem also depends on solutions, thus a loop. Loops are hints that responsibilities are not well delegated, because to do its job the first needs the second which, to do its own job, needs the first.

A solution must be related to the problem to know about the problem properties (number of variables, etc.), as you mentioned, and the problem is in charge of creating and evaluting solutions. The current approach allows to relate solutions, operators and problems, as we mentioned in the GECCO paper, to ensure that the compiler (and the IDE) checks that e.g. a binary problem uses binary operators which are applied to binary solutions.

Time to think about these ideas :)

matthieu-vergne commented 9 years ago

Yes, because this spaghetti code makes me unable to figure out how to fill the generics that you let empty, while it is the whole point, "as we mentioned in the GECCO paper, to ensure that the compiler (and the IDE) checks that e.g. a binary problem uses binary operators which are applied to binary solutions."

I was expecting you tell me that solutions necessarily have variables and objectives. Now the point is to see whether this information provides an advantage at the solution level or not. If yes, then it should be clear what is managed by the solution and what is managed by the problem, otherwise all should be managed by the problem and the solution should be oblivious regarding variable attributes, objective attributes, and other attributes. Now here is my vision:

Attributes and their Accessors

If I am correct, you have this:

public interface SolutionAttribute <S extends Solution, V> {
  public void setAttribute(S solution, V value) ;
  public V getAttribute(S solution) ;
  public Object getAttributeID() ;
}

that I would personally revise in that generic way:

public interface AttributeAccessor<Item, Value> {
  public void setAttribute(Item item, Value value) ;
  public Value getAttribute(Item item) ;
}

Why removing the dependence to the solution?

Because (i) I don't see its advantage at this level, and (ii) if we see that attributes can be used elsewere then it is ready for that.

Why removing the ID?

Because an ID, so an identifier, is a way to identify. It should provide a unique way to qualify the object, such that having this ID allows to retrieve the instance. The point is that having such an extra ID does not make sense out of any context, so defining it as an intrinsic property of the attribute itself is prone to be a mistake: you cannot forbid another attribute to have the same ID that way. However, it makes sense to give it a name (and why not a description) for display purpose, and you could extend DescribedEntity for that purpose. Now, I completely agree that an ID is necessary to know which attribute we are speaking about, and there is two kind of IDs I see we can use reliably.

The first kind of ID is one provided by the problem itself, because the problem is the one which knows which attributes (incl. variables and objectives) should be stored in the solution. You could say that the algorithm also needs to store attributes that it uses internally, but this can be dealt by asking to the problem to provide n free IDs (which does not conflict with the others). Of course, you can reverse the reasoning: this is the algorithm which provides the IDs and it should provide n free IDs for the problem (notice that n can be dynamic, like List<Object> getFreeIDs(int n)). I think that the problem-based is easier to imagine than the algorithm-based, but we can basically do both. However, because we can do both, it also means that none of them is better than the other unless we add preferences, thus subjectivity, so maybe we could imagine another case which is better than the two in an objective way.

The second kind of ID, and I claim that it is better than the previous cases, is an intrinsic ID (not something which has to be added manually), thus an ID which can be used independently of the context: this is the instance itself. As any object is fine to use it as an ID, you can use the SolutionAttribute or AttributeAccessor as an ID directly. This means that, to access an attribute, you should preserve the instances, but in Object Oriented programming this is the whole point: create a single object and reuse it when it is supposed to be reused, rather than re-creating it.

Impact on `Solution`

From my point of view, you can revise Solution that way:

public interface Solution extends Serializable {
  public Solution copy() ;

  public <V> void setAttribute(SolutionAttribute<Solution, V> attribute, V value) ;
  public <V> V getAttribute(SolutionAttribute<Solution, V> attribute) ;
}

or with my revised attribute:

public interface Solution extends Serializable {
  public Solution copy() ;

  public <Value> void setAttribute(AttributeAccessor<Solution, Value> accessor, Value value) ;
  public <Value> Value getAttribute(AttributeAccessor<Solution, Value> accessor) ;
}

Notice two things:

Solution is re-introduced here as a dependence for the attribute, and this is where you actually say that attributes managed in the solution are attributes compatible with solutions and nothing else (for other objects managing attributes you would use the corresponding type).
because we removed the variable methods to keep only attributes, we removed the generic (Solution<T> became Solution), and the variable is actually stored as an attribute where we specify the type through the generic Value. And to retrieve this variable value, we only need to know which attribute to use... attribute which should be provided by the problem.

Towards a Clear Algorithm/Problem Separation

If I push the reasoning until the end, I would also add an Attributable interface which provides the setter/getter that you have in Solution:

public interface Attributable<Item> {
  public <Value> void setAttribute(AttributeAccessor<Item, Value> accessor, Value value) ;
  public <Value> Value getAttribute(AttributeAccessor<Item, Value> accessor) ;
}

such that:

public interface Solution extends Serializable, Attributable<Solution> {
  public Solution copy() ;
}

Maybe you already see where I am going, but if not let me make it straight: the Solution interface could completely disappear.

Basically, the problem should provide problem-specific attribute accessors, while the algorithm should provide algorithm-specific attribute accessors, and you should have for each individual a generic storage which stores attribute values specific to this individual. If you remember our discussion about the interpretation of solutions (#22 at The problem-algorithm duality), here it comes back: you have the algorithm encoding and the problem encoding. Here, the algorithm encoding corresponds to the generic storage accessed through the algorithm-specific attributes, and the problem encoding corresponds to the generic storage accessed through the problem-specific attributes (thus AttributeAccessor). If you remove the storage, you have no values so nothing to encode, and if you remove the attributes, you have no semantic (problem nor algorithm) so nothing that you can really call an encoding, just meaningless data. Thus, we have a clear separation between algorithm-specific data and problem-specific data, with the common part (which makes the link between both) being the storage, which is nothing more than an Attributable. No need for a Solution. What about copy()? Use clone() instead of creating your own. It is exactly the same for your purpose.

Concrete Examples

I have implemented simple algorithms (Hill Climbing) for a simple problem (TSP) with a few attributes. I let you check it out. It is on my repo, in the jmetal-core module, in the experimental package. You can pull it on your own repo if you want or just make an independent clone on your machine.

I made them to experiment a bit and see what it could provide. From what I have seen so far, like the measures provide useful stuff for generalisability of algorithm, attributes provide useful stuff for generalisability of solutions. However, like measures should not be imposed on algorithm to avoid useless heavyness, attributes should not be imposed on solutions, and more generally I think that making an algorithm necessarily dependent of a fixed solution concept (interface or class) would be too much. You can look at the generic subpackage and see my generic Hill Climbing algorithm to see an example of algorithm which does not care about which kind of solution it manages, and the specific implementation in the specific package to see an algorithm which does not use any Solution stuff but use its own.

Reading recommendation:

AttributeAccessor
AttributeReader
AttributeWriter
AttributeControler
ComputedAttributeReader
SolutionStore
PracticalContext
TSPSolver
generic and specific packages

matthieu-vergne commented 9 years ago

I just had an idea.

You agree that, in the absolute, we are dealing with "attributes", whether we are dealing with variables, objectives or other properties. If you checked my code mentionned above, you may have noticed that I have splitted the reading an writing with AttributeReader/Writer.

Having the reading alone makes sense because, for computed attributes (like the length of a path in a TSP) it can be immediately retrieved from the other attributes. And while it makes sense to cache it, this is a background requirements: it does not make sense to give a writing access to the user. Thus, having an interface with only the reading makes sense.

On the other hand, having the writing alone does not: if you have enough authority to change something, it seems weird to not have enough authority to at least know the current value. Thus, while the AttributeWriter (write only) does not make sense, AttributeControler (read & write) does.

Now, we already have a similar pair of interfaces. Or at least I do, and you have one.

If we look at the algorithms, now we have the measures. These are accesses for read only properties of algorithms. Read only because measures are intrinsic properties of the algorithm, so they are not supposed to be set up by external entities. Additionally, but you do not have this one yet, algorithms have parameters. A parameter is a property of an algorithm such that, if you change it, it affects the behaviour of the algorithm. As it is now, a parameter is a read & write property. Read because you should be able to know its current value, and write because, by definition, you should be able to change it to affect the algorithm.

And here we find the two similar interfaces that we want to do for the solutions:

variables are to solutions what parameters are to algorithms, such that changing a variable of a solution affects its "value".
objectives are to solutions what measires are to algorithms, because they are inferred automatically from the solution within the context it is evaluated.

Consequently, the read only "attributes" could correspond to "objectives" while the read & write would be "variables". This would give:

public interface Variable<Solution, Value> {
  public void set(Solution solution, Value value);
  public Value get(Solution solution);
}

and:

public interface Objective<Solution, Value> {
  public Value get(Solution solution);
}

This way, we make a clear difference between the two.

Now, a potential question would be "why not reuse parameters and measures?". The point here is that there is a fundamental difference:

a parameter/measure is provided for a single algorithm, because one is not supposed to use several similar algorithms at once, thus a parameter/measure gives access to a property of the algorithm at hand.
a variable/objective is provided for a set of solutions, all the solutions of a given algorithm having the exactly same variables/objectives (but different values).

This difference justify that parameters/measures:

are provided by and describe an algorithm
only have Value as generic, the algorithm being implicit

while variables/objectives:

are provided by the algorithm but describe solutions
have an additional Solution as generic and an additional argument to precise which solution to describe

The only missing one then is the constraint, but what do you think about that? As it gives homogeneous interfaces, I think it is a good idea.

matthieu-vergne commented 9 years ago

And because they are provided by the algorithm, we could add to the algorithm a VariableManager and an ObjectiveManager, like the MeasureManager. Well, I don't mean that we necessarily need to have dedicated interfaces, but we need the equivalent methods to get the variables and objectives from the algorithm. This way, we homogenize everything and the learning curve is significantly decreased.

ajnebro commented 9 years ago

I have implemented simple algorithms (Hill Climbing) for a simple problem (TSP) with a few attributes. I let you check it out. It is on my repo, in the jmetal-core module, in the experimental package. You can pull it on your own repo if you want or just make an independent clone on your machine.

I have taken a look to your code. From my point of view, which is clearly biased ;), you are reimplementing stuff already in jMetal:

The TSP usually is represented as a permutation of integers, so we have https://github.com/jMetal/jMetal/blob/master/jmetal-core/src/main/java/org/uma/jmetal/solution/impl/DefaultIntegerPermutationSolution.java
The mutator for swapping two cities is also included: https://github.com/jMetal/jMetal/blob/master/jmetal-core/src/main/java/org/uma/jmetal/operator/impl/mutation/PermutationSwapMutation.java

The SpecificHillClimbing class implements Algorithm but also defines the mutation operator (method createMutant()) and the evaluation function (method evaluate()). As I see it, you are not writting a jMetal algorithm, because you are basically reimplementing everything.

Thus, we have a clear separation between algorithm-specific data and problem-specific data, with the common part (which makes the link between both) being the storage, which is nothing more than an Attributable. No need for a Solution. What about copy()? Use clone() instead of creating your own. It is exactly the same for your purpose.

I understand now the idea of no need for a Solution class, but then the resulting framework would not be a new version of jMetal, it would be a quite different tool. The architecture of jMetal is based on to clearly separating the concepts of algorithm, problem, operator and solution, and you are not using them in your hillclimbing algorithm.

For example, how would you deal with with a bi-objective version of the TSP (e.g., to minimize distance and cost)? In jMetal you have the solutions, the operators and the algorithms, only the problem has to be implemented.

Anyway, you are working in an experimental code, and I don't know if refactoring the code to reuse the entities already provided does make sense to you.

ajnebro commented 9 years ago

And here we find the two similar interfaces that we want to do for the solutions:

variables are to solutions what parameters are to algorithms, such that changing a variable of a solution affects its "value".

objectives are to solutions what measires are to algorithms, because they are inferred automatically from the solution within the context it is evaluated.

Consequently, the read only "attributes" could correspond to "objectives" while the read & write would be "variables". This would give:
public interface Variable<Solution, Value> {
  public void set(Solution solution, Value value);
  public Value get(Solution solution);
}
and:
public interface Objective<Solution, Value> {
  public Value get(Solution solution);
}

These ideas are attractive, and Juanjo and I have discussed sometimes about something similar (particularly in the case of the objectives). The point here is that a solution can have many variables (from 1 to a number of thousands) and one or more objectives (up to 20/30 in practice).

In the case of thousands of variables, performance is clearly affected when using lists of objects instead of merely using an array of double values, so we left the idea out. In the current development version, the variables are implemented as an ArrayList; I still have to compare the performance when solving a problem with a very high number of variables of the new version against the old one.

matthieu-vergne commented 9 years ago

About the re-implementation, I totally agree. My point was really to focus on the attributes, so I did not care much about other stuff. You can also see I did not use any measure.

Now I don't think it is important: although jMetal should allow us to reuse stuff, it should not force one to do so. For instance, if one already have an existing algorithm running, he should not re-implement everything in order to exploit some features like experiments. My experimental code comes like such a legacy code, already implemented for a given purpose. Actually, in a real case, one would use an existing code which does not use any of our generic interfaces, and create a proxy based on Algorithm, so that it can actually execute its own code within the jMetal framework.

Thus, not reusing the existent is not a problem, especially given the purpose of the experiment. What we provide are facilities, not requirements, and this is why it is important to keep them as decoupled as possible (and so why the D in SOLID is important).

I understand now the idea of no need for a Solution class, but then the resulting framework would not be a new version of jMetal, it would be a quite different tool. The architecture of jMetal is based on to clearly separating the concepts of algorithm, problem, operator and solution, and you are not using them in your hillclimbing algorithm.

Having a solution concept does not mean necessarily having an interface. You can have it as a simple generics, like List<T> has the concept T, which could be renamed to get List<Item>: you still have the concept, but this is a List specific concept, and no concrete implementation is required to put some instances under the label Item. For me solutions are the same, they do not need to fulfill any constraint, they are just relevant concepts because they are what an algorithm returns and what solves a problem. But the constraints they have to fulfil are algorithm-specific or problem-specific, not generic.

In the case of thousands of variables, performance is clearly affected when using lists of objects instead of merely using an array of double values, so we left the idea out.

Nothing forces you to have primitive variables only. You can have lists of stuff. For instance, for me the path of cities for a TSP can be a single variable. Of course it means to have the corresponding operators, but I think that there is also a lot of flexibility to exploit there. Like a measure manager can be used on the top of another one to provide other measures, a variable manager (if we use the same architecture) can be used on the top of the original one to provide just the relevant variables for the relevant operator. Once again, there is design choice and implementation optimization there. We should have a concrete case to properly illustrate, stress and evaluate it.

Notice that I have revised my interfaces for variables, objectives, and others. I have them on paper, I will see later how to introduce them (here or another issue).

ajnebro commented 9 years ago

You can have it as a simple generics, like List<T> has the concept T, which could be renamed to get List<Item>: you still have the concept, but this is a List specific concept, and no concrete implementation is required to put some instances under the label Item.

We already did that with the old SolutionSet class, which was replaced by List<Solution>, so I'm not closed to that idea. The point is that the SolutionSet in fact was a list, and I don't see that in the case of Solution and List<Item>

matthieu-vergne commented 9 years ago

I am not sure to get your point.

A variable for me is a parameter for the solution, like I said. Whether you consider that your solution is parametered with several integers or a single list of integers is a matter of design choice. From my point of view, given that these integers are potentially dependent (if you map each city to a given integer, then you can set a given value at a given index only if it is not set on another index), then it makes sense to consider all of them together, as a single (independent) variable. Of course you could say that in the absolute, any solution could be encoded with a single variable which encloses all the required information... and I would say "precisely", like at the end a chromosome encloses all the genes, or a solution is nothing more than a sequence of bits, so you could simply give all the information as a single variable. But once again, it is a design choice: if it makes sense to encode it another way to have a more adapted implementation, then encode it another way. A set of variables is nothing else than a specific representation of the information, like the sequence of bits is another, and we are free to change this encoding to adapt it to the operators at hand.

I would like to illustrate it with an example but I am tired and I don't want to go to sleep too late because I have to wake up early, but just to give you the intuition: take an operator for the TSP which exploits separated variables (a jMetal one or a new one). You can obviously use this operator in the case where you have a solution with a variable for each integer, and it would take all of them separately because it is designed for that. But you can also use it on a solution wich encodes a single variable (list of integers) such that, to use this operator, all you need to do is create dedicated variables like that:

class SeparatedVariable implements Variable<Integer> {
  private final Variable<List<Integer>> listVariable;
  private final int index;

  public SeparatedVariable(Variable<List<Integer>> listVariable, int index) {
    this.listVariable = listVariable;
    this.index = index;
  }

  public Integer getVariable(Solution solution) {
    return listVariable.getVariable(solution).get(index);
  }

  ...
}

This way, if you want to use the operator while you have a single list, you instantiate a SeparatedVariable for each index of the list and you give your list of new variables to the operator, which can now interact with your single list variable as if it was separated variables.

Of course, you can also do the reverse (single list operator but dedicated variables).

If it is not clear, give me your usual solution encoding and one usual operator for the TSP so I can illustrate on concrete stuff.

matthieu-vergne commented 9 years ago

Just to come back on the solution disappearance: I don't expect the concept of solution to disappear, but any solution-generic (i.e algoritm-/problem-independent) constraint to disappear. Thus, while I do not expect any solution interface to be there, I still expect algorithms or problems to provide variables and objectives (and Algorithm to have Solution generics, problems I don't know but why not), thus solution constraints which are algorithm-/problem-dependent.

Meanwhile, nothing forbids to provide facilities to encode solutions in a generic way (like my SolutionStore, but you could imagine anything else you prefer) so that it is already using the right interfaces to not need proxies. But any implementation not using such interfaces could be easily adapted through such proxies, supported by factories or builders for the most common/advanced proxies, so that other features like experiments can be applied to them with minimized implementation efforts and performance impacts.

I know I am selling my own perspective and it can be hard to find counter arguments because I have more expertise in Java. But if you are not confident in my ideas, the best way I found for you to stress them (so you can check it fits your requirements) is to ask me how I would implement a concrete case that you have already implemented, and check that it satisfies all the needs that you tried to satisfy by implementing it the way you did it.

matthieu-vergne commented 9 years ago

On the other side, my stress would be: why do you want a concrete Solution interface/implementation?

ajnebro commented 9 years ago

As I see it, the point is not a matter of Java expertise, it is matter of the architecture of the software. As I mentioned before, to me the architecture is built around solutions, operators, problems and algoritms. That is the result of the view that Juanjo and I have of the framework. When talking with other colleagues that also have designed their own optimization software, they did it in a different way.

There is not a unique nor best way to address the design of a software like jMetal. A different issue is to not take full advantages of the features of the programming lenguage used to implement it.

Many people have told us that they like jMetal because of its architecture, so what I'm not looking for in the new version is a radically different system.

matthieu-vergne commented 9 years ago

OK, I get that there is some conservation requirements. And my point should be to show that what I propose still fulfils your requirements, because this is what I think.

Basically, you can keep all what you have, because in the absolute, what I say relates to this change:

public abstract class AbstractEvolutionaryAlgorithm<S extends Solution, R>  implements Algorithm<R>{
  ...
}

changed into:

public abstract class AbstractEvolutionaryAlgorithm<S, R>  implements Algorithm<R>{
  ...
}

Thus, it is not anymore a constraint that solutions have to be structured in a given, generic way. It is only a facility: they can be implemented the way you provide. And my claim is that, considering the needs I have understood so far and the features we are developing (e.g. measures, parameters, variables, objectives), this constraint won't be needed anymore.

But you can keep all of it as it is for now. And if I play the future teller, here is what I predict: at some point, we will be able to remove this extends Solution without having to change anything (no warning nor error). But for now, you can keep it as is.

ajnebro commented 9 years ago

the best way I found for you to stress them (so you can check it fits your requirements) is to ask me how I would implement a concrete case that you have already implemented, and check that it satisfies all the needs that you tried to satisfy by implementing it the way you did it.

Ok. Just consider an algorithm such as NSGA-II, which implements the AbstractEvolutionaryAlgorithm class. This algorithm requires a selection, crossover and mutation algorithm.

Currently, the same algorithm is used to solve the MultiobjectiveTSP (permutation), ZDT1 (real), and ZDT5 (binary). You can find the details about how to configure (including the operators) it in:

Of course I'm not suggesting you to reimplement the full algorithm. I think that an skeleton of the class structure would be enough to have stuff to discuss about.

ajnebro commented 9 years ago

(I was writing the last post at the same time you were writing yours)

matthieu-vergne commented 9 years ago

So do you prefer to go ahead with that point or focus on more critical discussions? I think we can go ahead and see in the future if it is relevant. I expect that at some point we will face cases needing to remove this constraint, and as I said I expect to have such an evolution which will allow us to remove it without having to change anything else, so I feel confident in letting it for now.

The main point of this issue was the clear identification of who is responsible of what, and we came to this idea where variables and solutions should be provided by the algorithm/problem while the solution acts only as a storage. We can go ahead with that if you find it attractive, I don't mind much. I see the disappearance of the constraint as a future work which will naturally come back by having to manage real cases of legacy implementations (maybe we can already manage them with the constraint, but in the best case I think that it will require useless additional effort). We can see at this time how to respond to these needs.

So, go ahead? Remain with generic solution? If we remain, I come back on the interfaces and propose revised ones.

ajnebro commented 9 years ago

Basically, you can keep all what you have, because in the absolute, what I say relates to this change:
public abstract class AbstractEvolutionaryAlgorithm<S extends Solution, R>  implements Algorithm<R>{
  ...
}
changed into:
public abstract class AbstractEvolutionaryAlgorithm<S, R>  implements Algorithm<R>{
  ...
}
Thus, it is not anymore a constraint that solutions have to be structured in a given, generic way. It is only a facility: they can be implemented the way you provide. And my claim is that, considering the needs I have understood so far and the features we are developing (e.g. measures, parameters, variables, objectives), this constraint won't be needed anymore.

If the idea of removing the constraint in public abstract class AbstractEvolutionaryAlgorithm<S, R> implements Algorithm<R> is to allow in the future to deal with other classes than those inheriting from Solution, as your suggestion of List<Item>, I don't see any reason to not doing so now :).

matthieu-vergne commented 9 years ago

Just notice it is not only this class, but a generic modification: the concrete implementation of solution would be chosen at a quite low level, and not introduced at high levels of abstractions. So you should revise all you abstract classes to remove such a dependency until you reach a level of implementation where you cannot ignore it anymore... and fill all your parameters correctly at each level {^_^} (which is something which should be done for the release, so better to start soon, I tell you how to do it fast below), otherwise all your solutions will be just Object instances.

By the way, I would recommend to use more concrete names for generics (Solution instead of S, Result instead of R, etc.). Considering that they are "free" parameters, we don't need high level interfaces and we can directly consider concrete implementations (abstract or not). Then it would be useful to have more explicit names for these implementations to tell what it focuses on (e.g. GenericSolution for a solution which just store attributes, maybe variables and objectives separately, but don't make any assumption regarding any value type nor quantity, while another ListSolution would be a good candidate to use for a TSP... I don't think so but just an illustrative example).

Managing generics parameters

Adding the missing ones

For filling the parameters, you first have to ensure that your parent class is correctly implemented/extended, so this:

public class ParentClass<P1, P2, P3> {
  ...
}

public class ChildClass extends ParentClass {
  ...
}

should become this:

public class ParentClass<P1, P2, P3> {
  ...
}

public class ChildClass<P1, P2, P3> extends ParentClass<P1, P2, P3> {
  ...
}

so you just copy-paste the parameters of the parent class to the child class and its extension. If there is some constraints in the parent class, then it should be propagated to the child class only, not the extension:

public class ParentClass<P1 extends ConcreteP1, P2, P3> {
  ...
}

public class ChildClass<P1 extends ConcreteP1, P2, P3> extends ParentClass<P1, P2, P3> {
  ...
}

so the extension just get the parameters, not their constraints. Then, you can review the constraints of the child class if you need to be more specific, but this is the basic way to propagate the generics. If you need additional generics for the child class, then just add them, like you have Algorithm<R> implemented by AbstractEvolutionaryAlgorithm<S, R>, which adds S.

Abstracting: replace concrete implementations by generics

Then, at some point, you should face cases where you use concrete implementations. In such a case, two solutions:

it is required for you to have this implementation at this level
it is not and you should use the generics instead

In the first case, you just have to revise the parameters of the child class correspondingly, for instance passing from:

public class ChildClass<P1, P2, P3> extends ParentClass<P1, P2, P3> {
  ConcreteP1 myVar;
  ...
}

to:

public class ChildClass<P2, P3> extends ParentClass<ConcreteP1, P2, P3> {
  ConcreteP1 myVar;
  ...
}

so remove the generics for the child class and replace it in the extension by the right type. This modification completely fixes the implementation, so you cannot change it anymore in further child classes (which needs then to use casting to trick it). If you want to allow someone to use its own version (an extension of ConcreteP1) then you should just add the constraint on the child class parameter and revise the corresponding variables to use the generics (I tell you later how to do it in a massive way):

public class ChildClass<P1 extends ConcreteP1, P2, P3> extends ParentClass<P1, P2, P3> {
  P1 myVar;
  ...
}

The advantage is to keep flexibility (a child class can specify a more concrete implementation without having to cast anything), the inconvenient is to keep the parameter, so it will have to be provided in further child classes. Chosing one or the other is a design choice, but basically you should use the first one where you intend to create the instances. So if your algorithm has an implemented method like ConcreteP1 createMyP1() {...}, then you should fix it completely. If it is an abstract method or if you let some external operators do such a creation for you, keeping the generics could be more interesting.

In any case, you will surely have to revise the code of your child class each time you fill the parameters of the extension (i.e. passing from extends ParentClass to extends ParentClass<P1, P2, P3>) because you will have to change the inferred types correspondingly, so that a situation initially like that:

public abstract class ParentClass<P extends ConcreteP> {
  abstract P myCustomMethod();
}

public class ChildClass extends ParentClass {
  ConcreteP myCustomMethod() {...}
}

will need the ConcreteP myCustomMethod() of the child class to be revised (change the returned type correspondingly). This ConcreteP comes because you did not fill the parameters of the extension, so it takes the most open one which is the constraint defined in the parent class (P extends ConcreteP). Once you fill your parameters, you have to revise all that stuff, otherwise it won't compile. Notice that if the parameter was not constrained, the concrete type used is Object. If you see some Object around, check that they do not come from a free parameter that you did not specify.

Massive refactoring

The objective is for instance that all your Solution uses should be for example replaced by the parameter S in the code of the child class. On Eclipse, you can use refactoring to do it easily in a massive way, the point is that at some point it could explode if you were not clean enough in your implementation. But this will have to be dealt with anyway, and considering that almost no parameter was provided, the explosion should not go further than the current class you are reviewing.

For instance, using refactoring on:

public abstract class AbstractEvolutionaryAlgorithm<S extends Solution, R>  implements Algorithm<R>{
  ...
}

you should run the renaming refactoring feature of your IDE on S and rename it as Solution to have this:

public abstract class AbstractEvolutionaryAlgorithm<Solution extends Solution, R>  implements Algorithm<R>{
  ...
}

It should introduce a warning or an error. Then you remove the extension:

public abstract class AbstractEvolutionaryAlgorithm<Solution, R>  implements Algorithm<R>{
  ...
}

and then you rename back with the same refactoring feature to the original name S:

public abstract class AbstractEvolutionaryAlgorithm<S, R>  implements Algorithm<R>{
  ...
}

This way to do allows to link, in a systematic way, any additional Solution which were not yet linked to S (explicitely). It is not the case for AbstractEvolutionaryAlgorithm, but for instance if you had:

public abstract class AbstractEvolutionaryAlgorithm<S extends Solution, R>  implements Algorithm<R>{
  List<S> population;
  Solution bestIndividual;
  ...
}

then removing the extension immediately without any refactoring would not fix the dependence to Solution, because you still have this bestIndeividual member. While the refactoring allows to reach this state:

public abstract class AbstractEvolutionaryAlgorithm<S, R>  implements Algorithm<R>{
  List<S> population;
  S bestIndividual;
  ...
}

Thus, even if you were not that rigorous, you can still fix these mistakes. Then you should have errors when there is actually a requirement for using a concrete implementation. In such a situation, whether you should fix just this one, whether it is actually a case where you need the concrete implementation. In the former case, just modify the relevant code. In the latter case, as described before, you just add the constrain on the parameter or completely fix the type by refactoring the parameter:

public abstract class AbstractEvolutionaryAlgorithm<MyConcreteSolution, R>  implements Algorithm<R>{
  List<MyConcreteSolution> population;
  MyConcreteSolution bestIndividual;
  ...
}

and removing it:

public abstract class AbstractEvolutionaryAlgorithm<R>  implements Algorithm<R>{
  List<MyConcreteSolution> population;
  MyConcreteSolution bestIndividual;
  ...
}

so it broadly uses the concrete implementation.

Using renaming refactoring is a good way to make massive changes in a systematic way. You can introduce a fake parameter with the same name than a concrete implementation, rename it as another concrete implementation and remove the parameter afterwards, so you replace a concrete implementation by another one (then just change the relevant stuff, identified by errors, if any).

ajnebro commented 9 years ago

By the way, I would recommend to use more concrete names for generics (Solution instead of S, Result instead of R, etc.).

I agree; the reason for using S or R was a result of running SonarQube over the project and the reported warnings about using S instead of Solution and so on .

matthieu-vergne commented 9 years ago

Do you mean that SonarQube tells you that it is better to use single letters than whole words? Or is it just because it hides the Solution interface (Eclipse makes a warning for that)? For the latter case, then it should be fixed by removing the Solution interface (normally useless because we remove any dependence on it) and keeping only concrete implementations with more specific names (and so use these ones when concrete implementations need to be used).

matthieu-vergne commented 9 years ago

It seems that my idea converges to the Entity-Component-System design pattern. Might be worth reading.

ajnebro commented 9 years ago

I'll take a look to it.

ajnebro commented 9 years ago

Look taken. After a pair of reads I'm not sure about how to apply this pattern to jMetal.

This leads me to think in an idea that I commented to you some time ago: to start a new framework from scratch to explore new ideas, without having to worry about compatibilities with jMetal. It could be just a proof of concept, and perhaps it would not become more than a prototype, but it could worth a try.

matthieu-vergne commented 9 years ago

Could be.

jMetal / jMetal

Problem-solution spaghetti code #71

Attributes and their Accessors

Impact on Solution

Towards a Clear Algorithm/Problem Separation

Concrete Examples

Managing generics parameters

Adding the missing ones

Abstracting: replace concrete implementations by generics

Massive refactoring

Impact on `Solution`