ppgaluzio / MOBOpt

Multi-objective Bayesian optimization
MIT License
81 stars 23 forks source link

non_dominant points #10

Closed RodrigoAVargasHdz closed 3 years ago

RodrigoAVargasHdz commented 4 years ago

Hi,

I have a simple couple of questions about the hyperparameters of the code and the output of the code.

1) Any idea of how I should choose the values of prob and q?

2) If I understand correctly your code has 4 output, "front", "pop", "y_Pareto" and "x_Pareto". "front" and "pop" are the pareto Front and pareto points estimated with by the nsga2 algorithm. "front" are not the actual values of the objective function for each value of "pop", correct? For the objective functions, I am using it, "front" has values that the objective functions can not have. Furthermore, after a deep optimization (750 iterations) if I evaluate the values of "pop" they differ drastically from the values of "front". The dimension of both arrays is ( n_pts,-).

About "y_Pareto" and "x_Pareto", your code says these are non-dominant pareto points. However, by eye inspection they are actually the pareto points of the objective functions. However, the dimension of both arrays is not ( n_pts,-).
Should I consider "y_Pareto" and "x_Pareto" as the acutal pareto set of points?

Thanks! Rodrigo Vargas

ppgaluzio commented 4 years ago
  1. The parameters are kind of experimental, there might be different optimal values depending on the problem.

prob should not be too large, or you are always choosing the next point randomly, if it is too small, then you risky getting stuck in some suboptimal solution. I would say something between 0.1 and 0.25 should be good. keep in mind that if you expect your PF to be discontinuous, a larger value of prob might be good.

The value of the optimal q is also dependent on the problem. Smaller values tend to favour unexplored regions of the search space, and larger values favour unexplored regions of the objective space. I would go from 0.5 up for most problems. But again, a little experimenting might be a good idea.

  1. Your interpretation of front and pop are correct, they are generated from the application of the NSGA2 to the gaussian process, so they are expected to agree to the Pareto Front only after good convergence was achieved. As for y_pareto and x_pareto they are the set of non dominated points in the set of points that the optimizer used to call the objective function. For larger iteration values, both sets are expected to converge. For well behaved functions though, when very small iteration points are sufficient, the front and pop provide a better approximation for the pareto front.
RodrigoAVargasHdz commented 4 years ago

Ok, the problem is that if I evaluate the objective functions at each element of "Pop" the values are totally different from the "Front". And one thing that is more weirdly is that the set 'PF' and 'PS' has actually better values and more consistent :/

On Thu, Aug 20, 2020 at 4:53 PM Paulo Galuzio notifications@github.com wrote:

  1. The parameters are kind of experimental, there might be different optimal values depending on the problem.

prob should not be too large, or you are always choosing the next point randomly, if it is too small, then you risky getting stuck in some suboptimal solution. I would say something between 0.1 and 0.25 should be good. keep in mind that if you expect your PF to be discontinuous, a larger value of prob might be good.

The value of the optimal q is also dependent on the problem. Smaller values tend to favour unexplored regions of the search space, and larger values favour unexplored regions of the objective space. I would go from 0.5 up for most problems. But again, a little experimenting might be a good idea.

  1. Your interpretation of front and pop are correct, they are generated from the application of the NSGA2 to the gaussian process, so they are expected to agree to the Pareto Front only after good convergence was achieved. As for y_pareto and x_pareto they are the set of non dominated points in the set of points that the optimizer used to call the objective function. For larger iteration values, both sets are expected to converge. For well behaved functions though, when very small iteration points are sufficient, the front and pop provide a better approximation for the pareto front.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ppgaluzio/MOBOpt/issues/10#issuecomment-677898657, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHXHX4TZ2LONH3XHSP5TX3LSBWEONANCNFSM4QGRKQRQ .

ppgaluzio commented 4 years ago

The agreement between PF and Front is deeply connected to how well the Gaussian Process is fitting your objectives. By what you are describing I would expect that either your function is bad behaved close to the Pareto front (or perhaps just fluctuates too much, the GPs will try to perform a smooth approximation of your function, so if it varies too much, that means that a large number of points are required too provide a good approximation), or you have noise in your measurements of the objectives. In both cases, you may want to maybe change the kernel to the GP (perhaps even add some white noise kernel, with low amplitude), there was a recent commit to master that I added the feature of passing the kernel object to the optimizer.

In any case, keep in mind that the PF is always the subset of non-dominated points from the set of all the points probed by the method, that is why they seem more consistent. They might actually be a good approximation to the actual Pareto front. When I wrote this code I had in mind very expensive objectives, in which case, PF would never have enough points, compromising how well it could represent the Pareto front. The use of the front set solved that covering problem pretty well, as long as your functions were well represented by the GPs, which we can expect to be true in most cases.

RodrigoAVargasHdz commented 4 years ago

ok will try adding white noise for stability. I am working with the kernel version where the kernel is Matern-2.5 non-isotropic. Thanks for all the help :)

On Fri, Aug 21, 2020 at 3:00 PM Paulo Galuzio notifications@github.com wrote:

The agreement between PF and Front is deeply connected to how well the Gaussian Process is fitting your objectives. By what you are describing I would expect that either your function is bad behaved close to the Pareto front (or perhaps just fluctuates too much, the GPs will try to perform a smooth approximation of your function, so if it varies too much, that means that a large number of points are required too provide a good approximation), or you have noise in your measurements of the objectives. In both cases, you may want to maybe change the kernel to the GP (perhaps even add some white noise kernel, with low amplitude), there was a recent commit to master that I added the feature of passing the kernel object to the optimizer.

In any case, keep in mind that the PF is always the subset of non-dominated points from the set of all the points probed by the method, that is why they seem more consistent. They might actually be a good approximation to the actual Pareto front. When I wrote this code I had in mind very expensive objectives, in which case, PF would never have enough points, compromising how well it could represent the Pareto front. The use of the front set solved that covering problem pretty well, as long as your functions were well represented by the GPs, which we can expect to be true in most cases.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ppgaluzio/MOBOpt/issues/10#issuecomment-678442600, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHXHX4QIUB2ZRXEOC3B5J23SB273PANCNFSM4QGRKQRQ .