Receding horizon explorer gets stuck in local maxima

yoos commented 10 years ago

The scores in our current scoremap represent the amount of science we can do at a location. The receding horizon explorer looks ahead a few steps, adding these scores together, and chooses whichever branch promises the most science.

This makes the explorer prone to finding local maxima, or at least getting stuck in a small cluster of peaks before the scores depreciate enough to push the explorer to other areas of the map.

I see a few ways out of this:

Idea 1: Generate a second scoremap from the science scoremap to represent the usefulness of navigating to a certain location, based on what other science-rich areas we can explore from that location. We would run the explorer on this second, navigation scoremap.

Idea 2: Downsample the science scoremap, run the same exploration algorithm on this new map, and use the decisions generated here to bias the decisions generated on the full map. If we downsample to the point where the maximum gap between peak clusters is smaller than our lookahead value, this wil help the explorer jump that gap when it is nearby. If we downsample to the point where we can span the entire length of the downsampled map with the number of lookahead steps, the explorer could identify high-value areas from any point on the map. I think the former is much more realistic, however.

@yawei1 @drews256

yoos commented 10 years ago

Idea 3: Periodically evaluate the state of the science costmap for greener pastures, and if the current area is sufficiently depreciated, switch from normal exploration to a beeline towards that greener pasture before continuing.

yoos commented 10 years ago

Idea 4: Separate the concept of lookahead from actual data sampling. We can make the lookahead step large enough to cover a larger portion of the map, even if an actual movement step is shorter than the lookahead step. Effectively, we would make the lookahead step coarser than the movement step, instead of the two being one and the same as it is now.

This is what happens if the lookahead distance is too short:

rdml_lookahead_too_short

yoos commented 10 years ago

Idea 5: Depreciate into the negative. Instead of depreciating by a percentage amount, which will keep all values above zero, we can expedite movement away from already-explored areas by pushing values far below the norm.

Unfortunately, this is still susceptible to becoming stuck in local maxima, so it doesn't solve our problem.

yoos commented 10 years ago

Idea 6: Implement a high-level goal generator that, given our transport budget and estimated science available on the map, "rations" the budget to certain areas of the map. Once the explorer has expended its rationed budget exploring one peak, it should beeline to the next peak (yay, let's solve the TSP) to continue exploration.

~~I think this can be achieved with a single policy that increasingly biases the explorer towards the next goal as the rationed budget runs out.~~

~~That said, would we be just as effective if we ran lawnmower around each peak, since we know our rationed budget?~~

We're not trying to solve TSP.

yoos commented 10 years ago

Idea 7: Run a sparse sampler around the current location after every step and hopefully identify faraway peaks.

Doesn't sound so useful since we know where the peaks are ahead of time.

drews256 commented 10 years ago

I think idea 4 makes the most sense.

Sent from my iPhone

On Sep 5, 2014, at 3:46 PM, "Soo-Hyun Yoo" notifications@github.com wrote:

Idea 4: Separate the concept of lookahead from actual data sampling. We can make the lookahead step large enough to cover a larger portion of the map, even if an actual movement step is shorter than the lookahead step. Effectively, we would make the lookahead step coarser than the movement step, instead of the two being one and the same as it is now.

— Reply to this email directly or view it on GitHub.

yoos commented 10 years ago

~~Combined ideas 3, 4, and 7 (mostly 4)~~

Idea 8: Vary step distance and branching factor depending on average score and homogeneity around current location. https://github.com/osurdml/treemower/commit/6a7245f260dc0f289bd7f256c001d077912d7e6a

It's a bit jumpy, but it found the peaks! Here are some test runs from various start points given the same transport budget. Maybe it was accidental, but all four tests explore the high-yield area around (350, 400):

(0, 0): rdml_variable_step_dist_0_0

(250, 250): rdml_variable_step_dist_250_250

(50, 250): rdml_variable_step_dist_50_250

(250, 50): rdml_variable_step_dist_250_50

osurdml / issues

Receding horizon explorer gets stuck in local maxima #26