icaros-usc / pyribs

A bare-bones Python library for quality diversity optimization.
https://pyribs.org
MIT License
208 stars 33 forks source link

[FEATURE REQUEST] Adjustments to use negative objectives (cost)? #283

Open LemonPi opened 1 year ago

LemonPi commented 1 year ago

Description

It would be nice to have an example where it's a minimization (of a cost) problem instead of the usual maximization (of a reward). The most straight forward approach is to use the negative cost as the objective, but that poses an issue with the usual QD score metric where 0 is given to unexplored cells. Presumably the normalized QD score should address this here:

norm_qd_score=self.dtype(new_qd_score / self.cells),

but I had to look into the code to confirm whether it would make sense with negative objectives or not, and there may be other cases that implicitly depend on a non-negative objective. Even then, the metric would start with 0, jump to a high negative, and slowly get closer to 0 from the negatives. This is less compelling than a monotonically increasing metric.

The examples seem to use best - worst performance, which would work if the worst possible performance was known ahead of time and was bounded. But this is not the case in my problem. Switching it to best - worst as of now would not be valuable since it could be inflated by just finding a terrible solution.

You could also use 1/cost as the objective, but this would stretch the optimizing landscape and would lead to very different behavior.

It would be nice to have a tutorial or example with best practices for minimization problems, and things to watch out for.

btjanaka commented 1 year ago

Hi @LemonPi, this is a great idea. I'm actually not sure how to handle negative objectives when one does not know the limits; this is something we (the pyribs maintainers) have not had to deal with. Figuring this out would require some experimentation and may be a small paper in itself. Unfortunately, I don't think we have bandwidth to handle this issue right now, but we'll keep it as a TODO, and we're open to PRs if you come up with anything.

Also, note that norm_qd_score is just the QD score divided by the number of cells in the archive -- we include it in the ArchiveStats documentation here but this is for the latest version rather than the default stable version one sees on https://docs.pyribs.org

LemonPi commented 1 year ago

Yeah, not sure what would be a good solution for monitoring in this case. For example, between execution steps (pokes), the trend of the norm QD score with more optimization iterations changes. It's a minimization problem so closer to 0 is better, when the trend is decreasing with more iterations it's because it's finding previously unexplored cells; however, they are less valuable than average so it brings the average score down. When the trend is increasing with more iterations it's because it's finding better solutions within the existing cells. 4 5

Interestingly the archives between the two pokes are very different: 4 5

btjanaka commented 1 year ago

I will note that we recently merged a PR (#297) that adds a qd_score_offset parameter to the archives. This makes it possible to predefine an offset which will be used when computing the QD score for the archive; this offset will be subtracted from every objective value before summing them together to get the QD score. This doesn't help much if your objectives can be infinitely negative, but if your objectives tend to be above some known value, it can be pretty helpful. This is what we do in the Lunar Lander tutorial: https://docs.pyribs.org/en/latest/tutorials/lunar_lander.html#gridarchive -- here's the relevant excerpt:

Above, we specified qd_score_offset=-300 when initializing the archive. The QD score (Pugh 2016) is a metric for QD algorithms which sums the objective values of all elites in the archive. However, if objectives can be negative, this metric will penalize an algorithm for discovering new cells with a negative objective. To prevent this, it is common to normalize each objective to be non-negative by subtracting an offset, typically the minimum objective, before computing the QD score. While lunar lander does not have a predefined minimum objective, we know from previous experiments that most solutions stay above -300, so we have set the offset correspondingly.

LemonPi commented 1 year ago

I see, thanks will check that out! It's theoretically unbounded, but I think empirically I can set a bound. In this case unexplored bins will receive the score offset instead of 0 right? (This way the unnormalized QD score should be monotonically increasing as long as I don't encounter any solutions with a score below the offset).

btjanaka commented 1 year ago

Unexplored cells are simply not counted in the score. The normal QD score looks like:

qd_score = sum(objectives)

where objectives is the array of objective values of all elites in your archive. With the offset, the score becomes

qd_score = sum(objectives - offset)

So cells without an elite are still not counted at all.

btjanaka commented 1 year ago

And yes, the result is still that the QD score will be monotonically increasing as long as you don't encounter any solutions with a score below the offset.

LemonPi commented 1 year ago

Tried this out and it seems to work fine: concerning that my CMAMEGA is doing worse than CMAME though - thoughts? qd_drill_line

btjanaka commented 1 year ago

Hard to say, but here's a couple of things to consider:

tehqin commented 1 year ago

Without more information about the domain its hard to tell if the issue is setup for the DQD algorithm or a domain where derivative-free QD works better than DQD. Keep in mind there are domains where an optimizer like CMA-ES outperforms gradient descent, even for convex objectives.

Some things to keep in mind:

We're still working on making DQD algorithms more robust. The approach is relatively new so it's valuable to know problems where the current algorithms struggle. Did you want to chat offline about your setup? We may be able to help you tune DQD algorithms to get good results or catch a bug in your current setup.

LemonPi commented 1 year ago

Without more information about the domain its hard to tell if the issue is setup for the DQD algorithm or a domain where derivative-free QD works better than DQD. Keep in mind there are domains where an optimizer like CMA-ES outperforms gradient descent, even for convex objectives.

Some things to keep in mind:

  • CMA-ES (and CMA-MAE) is derivative-free, but approximate second-order (due to being a natural gradient method). In settings that are highly ill-conditioned representations you may see CMA-MAE outperform a first-order method like CMA-MAEGA.
  • Approximate second-order methods can win on low-dimensional problems. Usually a DQD method starts to outperform derivative-free methods on high dimensional problems. In the DQD and CMA-MAE papers, we found that 512 dimensional search space produced a small benefit when searching StyleGAN, but a ~10,000 dimensional search space (full w-space latents) for StyleGAN2 had a significant benefit.
  • Gradient descent requires more tuning than adaptive algorithms like CMA-ES. For example, if you train a neural network with Adam, but the step size is too large you will get much worse results. Picking a good starting sigma and learning rate for Adam in CMA-MAEGA may help you get better results.

We're still working on making DQD algorithms more robust. The approach is relatively new so it's valuable to know problems where the current algorithms struggle. Did you want to chat offline about your setup? We may be able to help you tune DQD algorithms to get good results or catch a bug in your current setup.

I do still gain some benefits from CMA-MEGA over CMA-ME on the metrics I care about, just not dramatically so. The QD score is still weirdly worse for CMA-MEGA but for my application I don't necessarily care about illuminating the entire space, just enough to find a set of good solutions. (and the "mu" selection rule that was in some example emitter code turned out to be quite a bit worse than "filter" for my problem since the solutions we're initializing the archive with are known to be pretty good).

I recently submitted my work to a double blind review conference so I'm not sure how much I can talk about it during the review process - but definitely after it would be good to have a chat!