python-adaptive / adaptive

:chart_with_upwards_trend: Adaptive: parallel active learning of mathematical functions
http://adaptive.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
1.13k stars 58 forks source link

adaptive fails to discover features #113

Closed jbweston closed 5 years ago

jbweston commented 5 years ago

(original issue on GitLab)

opened by Pablo Piskunow (@pablopiskunow) at 2017-10-27T13:04:19.803Z

In this test we see how adpative fails to discover an all the peaks of an unknow distribution of gaussians. This failure can be overcome by setting a minimum and maximum size of the triangles.

how it fails

Grab the notebook presenting adaptive and provide the following function

def normal(xy):
    np.random.seed(0)
    num_points = 10
    x, y = xy
    sigma = 0.05
    result = 0
    for i in range(num_points):
        x0, y0 = 2*np.random.rand(2) - 1
        result += 1 / (np.sqrt(2*np.pi) * sigma) * np.exp(-((x-x0) ** 2 + (y-y0) ** 2) / (2 * sigma) ** 2)
    return result

This is a set of randomly positioned gaussians of width sigma. To avoid more randomness comming from paralelization, lets set

from concurrent.futures import ProcessPoolExecutor
executor = ProcessPoolExecutor(max_workers=1)

and run

bounds = [(-1, 1), (-1, 1)]
learner = adaptive.learner.Learner2D(normal, bounds=bounds)
runner = adaptive.Runner(learner, executor=executor,
                         goal=lambda l: l.n>10 and l.loss() < 1e-2)

We get the following result

old_adaptive_1

Which looks like adaptive works well! But it's a lie...

There are more points yet to discover.

old_adaptive2

how to solve it

The max_area parameter ensures that we do not loose small features (hidden inside a big triangle), and the min_area parameter ensures that we do not oversample very high peaks, further than a defined plot resolution.

Since plot resolution is a more intuitive way to understand the plots (rather than scaled areas of triangles), let's assing a max_resolution = 1 / min_area, and min_resolution = 1 / max_area.

In the current status of adaptive, there are no bound on the areas, therfore is like having max_resolution = np.inf, and min_resolution = 1, which means the whole area of the 2D plot.

We put more reasonable parameters, for example, we wish to see all the features that can be discovered by a 10x10 plot, min_resolution = 10**2, with the resolution that could be achieved with a 50x50 plot, max_resolution=50**2. We add this and modify the loss function with

areas[areas < min_area] = 0
areas[areas > max_area] = np.inf

edit: swap min, max

With this parameters, we have the expected results.

new_adaptive_1

which compares well with the homogeneous sampling

new_adaptive2

jbweston commented 5 years ago

originally posted by Anton Akhmerov (@anton-akhmerov) at 2017-10-27T13:30:56.002Z on GitLab

@pablopiskunow thanks for reporting the issue.

I think we have to be more specific here with what we consider failure of adaptive. The idea of adaptive is to prioritize sampling "interesting" regions over the boring ones. In your example this is exactly what happens. Further, eventually adaptive would also discover the other missing peak because it would sample the already existing peaks until they become "boring" and start to look elsewhere.

If the whole function is composed of randomly positioned sharp peaks of the same width, the optimal strategy for discovering all these peaks will always begin from sampling on a homogeneous grid that is sparse enough. Any deviation from this strategy will discover the peaks slower. From this point of view, loss is merely a statement what the user prioritizes. Therefore I don't think your example is a failure, it is an illustration that sometimes different loss functions may be better than the default one.

Eventually we aim to make it easy to specify varied sampling strategies by tweaking a learner's loss function. I am unsure whether implementing min_area and max_area is a sufficiently general way of implementing such a modification, and I see some problems with it. For example it will assign the same loss to triangles with 1.01 × max_area and 100 × max_area. I believe that in order to understand what kind of tweaking of loss we may need we will have to:

  1. assemble a collection of possible function ensembles that a 2D learner may encounter
  2. setup automated testing of performance on functions from this ensemble
  3. declare what kind of internal learner state may be exposed to the user
  4. investigate which functions of this internal learner state are sufficiently flexible to allow efficient learning of functions from the ensemble
jbweston commented 5 years ago

originally posted by Pablo Piskunow (@pablopiskunow) at 2017-10-27T14:04:16.023Z on GitLab

I think that for a fixed definition of loss there will always be some kind of function that fools it, since the space of functions is immense. Therefore, we could think about the "outcome", instead of the "income", meaning, we can analise the plots that adaptive will produce, if plotting is the main purpose.

From the plotting point of view, it happens that we oversample some regions of the plot, and leave other regions unexplored.

If we use loss as a goal, and therefore we wait until all the necessary points are calculated, there might still be some regions unexplored.

On the other hand, if we set a max_resolution we will not waste function evaluations in an already dense area (even if it is still not fully resolved, and therefore "interesting" for the learner). Having more points there will yield no profit from the plotting point of view, although it might be profitable if we want to integrate the function around a pole/divergence.

Along with this, if we set a min_resolution that fixes a max_area, there will be a bunch of areas at first, that will have an infinite loss. This will stop being the case as we let the runner run, all the triangles with areas above max_area, will be broken down to smaller triangles at first, therefore ensuring a minimum sampling resolution of the plot, and ensuring that all features of width above this minimum area are discovered.

jbweston commented 5 years ago

originally posted by Pablo Piskunow (@pablopiskunow) at 2017-10-27T14:06:57.991Z on GitLab

edit: mixed min and max areas above

jbweston commented 5 years ago

originally posted by Pablo Piskunow (@pablopiskunow) at 2017-10-27T15:37:58.584Z on GitLab

Thinking more about it, it is redundant to define a goal=lambda l: l.loss()<tol, and a max_resolution at the same time, since the minimum loss[1] will be given by np.sqrt(min_area)[2]. Therefore putting tol=(2/max_resolution)**0.5 yields the same result as not specifying the max_resolution to the learner (tested).

So the only relevant parameter would be min_resolution which shall set a max_area. Adding this parameter can provide a reasonable way of specifying the behaviour of the learner in the two stages of "exploring" and then "refining".

The "exploring" stage will first ensure that all triangles are smaller than some defined area, and then, when the areas are all below a threshold, the "refining" stage will start, by using the dev value inside the loss function.

Adding this feature will make us able to ensure to the user something about the results, this something is that all features bigger than a threshold will be resolved, and the degree of resolution is given by the goal.

[1] edit: the minimum loss for the worst triangle. "Worst" meaning with dev=1, where dev are normalized deviations of the function from its interpolated values.

[2] edit2: for some reason, the areas are square-rooted in the current definition of loss. This choice doesn't seem to be justified.

jbweston commented 5 years ago

originally posted by Bas Nijholt (@basnijholt) at 2018-01-02T14:54:07.015Z on GitLab

@pablopiskunow do you think this issue is solved by what I write in https://gitlab.kwant-project.org/qt/adaptive/merge_requests/11#note_10696?

If so, could you close this issue?

jbweston commented 5 years ago

originally posted by Pablo Piskunow (@pablopiskunow) at 2018-01-03T02:38:43.023Z on GitLab

In my opinion, it would be quite useful if adaptive could ensure (by default) a minimum size of features it can discover. The user will otherwise not be aware of the possibly missing features.

That said, with the fix of using a custom loss function, the aware user is able to pass it, so this issue can be closed.

jbweston commented 5 years ago

originally posted by Bas Nijholt (@basnijholt) at 2018-02-13T14:04:26.670Z on GitLab

The following two lines of code do essentially the same without complicating the interface.

def add_minimum_resolution(learner, minimum_resolution):
    from collections import OrderedDict
    xs = np.linspace(-0.5, 0.5, minimum_resolution)
    learner._stack = OrderedDict({(x, y): np.inf for x in xs for y in xs})

learner = adaptive.Learner2D(ring, bounds=[(-1, 1), (-1, 1)])
add_minimum_resolution(learner, 40)
runner = adaptive.Runner(learner, adaptive.runner.SequentialExecutor(), goal=lambda l: l.loss() < 0.01)
jbweston commented 5 years ago

originally posted by Joseph Weston (@jbweston) at 2018-02-19T17:53:51.621Z on GitLab

fixed by gitlab:!49