Suggestion: hide the actual function to be evaluated from learners #93

Open basnijholt opened 5 years ago

(original issue on GitLab)

opened by Christoph Groth (@cwg) at 2018-01-20T14:43:17.989Z

Learners do not need to have access to the function object they are trying to predict. I believe that they should not have access to it.

Learners request points and are fed back results. At no point do they need to know how the results are actually obtained. Not knowing something that one does not need to know can be a good thing. Learners can be useful to predict all kinds of functions, not only things that can be represented well as a Python function.

Of course any way to obtain results can be expressed as a function (if necessary with internal state and blocking). But this can lead to considerable unnecessary complexity and inefficiency.

Examples:

The function to be learned is an asynchronous function, defined with async def. One can write a callable object that internally uses async programming and blocks on calls, but if this one is driven by something like the current runner (that uses asyncio itself), complexity rapidly explodes.
The function to be learned is ran by a remote procedure call (over the network, using some messaging library) to one of 20 available nodes. The remote requests have to be collected, load-balanced and perhaps submitted in some particular way that is only known by the runner. Again, this can be all abstracted as a callable object, but the much better approach is to have a runner that is specific to that messaging library and knows how to deal with it.
The function to be learned is the measurement of some experiment that involves doing something mechanical (moving a robotic arm, say). As such, requests should be sorted by x coordinate, so that the arm has to move as little as possible. Again, a custom runner is the most elegant solution.

@jbweston, I'm not convinced by the argument that a learner should encapsulate everything needed, including what should be learned. IMHO the whole point of a learner is that it is useful without the function. E.g. it should be possible to pickle a learner without pickling the possible monstrosity of a function (that could make calls to some horrible DFT library) that it approximates.

originally posted by Joseph Weston (@jbweston) at 2018-01-31T11:03:22.033Z on GitLab

At the moment we keep a reference to the learned function inside the learner. This is not a fundamental design decision, but merely done so that users only have to pass around 1 object (the learner) rather than keeping track of the function-learner pairing themselves. In all the concrete use-cases that we have encountered so far this seems to make the most sense.

AFAICT the only fundamental difference between storing the function in the learner vs. not is the question of pickling. As you know the learner never actually accesses the property that is used to store the function; it is only accessed by Runners. The current runner assumes that learner.f is a python function, but we could of course have different runners that expect different things of learner.f. We could have a runner that expects an async def function, or a runner that expects a URL (to identify a remote procedure). None of this has anything to do with whether or not we store the "function" in the learner or not. People just need to be aware that if they specify the function to be learned as a string, then they'd better damn well have a runner available that knows how to interpret this.

OTOH, how would you implement the BalancingLearner? This is a learner which itself contains several learners and chooses points from the "best" learner each time they are requested.

The function to be learned is the measurement of some experiment

This would IMO be better encapsulated by a learner that chooses points based not only on the where is best to learn the function, but also on the experimental constraints.

originally posted by Christoph Groth (@cwg) at 2018-08-21T09:48:02.973Z on GitLab

I didn't understand what BalancingLearner does (even though I read the docstring -> hint), but now Anton explained it to me. So a balancing learner is a an object that balances the effort of learning multiple (in general unrelated) functions. For example, multiple integrals can be calculated at the same time and the effort balanced such that their respective absolute errors are of a comparable magnitude.

I think that the current balancing learner could be changed to accept a learner class and a sequence of functions to be learned. That would work just as well as the current design.

However, I think that the idea of doing the balancing in a learner is actually a bad one. I think that a clean design is to define learners as objects that learn some function over some domain, and runners as objects that know how to evaluate some function but ignore its structure. This provides for a clear separation of responsibilities. A runner takes care of, say, a parallel framework, or measuring, or async execution. A learner focuses on approximating a function that belongs to some class.

So it makes sense to see the learner as an approximation of the function and it would be IMHO very natural to add a __call__ method that predicts the function. But what should such a __call__ method do for a balancing learner where the domains of the sub-learners don't even have to be the same data type?

It seems to me that balancing should be rather something that is built into most runners. I don't see a reason why almost any runner shouldn't have this capability. It is also more natural given that a runner exploits some resource (say a computing cluster or a measuring apparatus), to let it distribute the work in an appropriate way.

originally posted by Anton Akhmerov (@anton-akhmerov) at 2018-08-21T19:59:29.253Z on GitLab

Sounds like a reasonable design goal. We will need to think about the reorganization:

Who would be responsible for plotting the balanced learners?
Who would be responsible for creating a collection of learners from combination of parameter values of a function?
What are the responsibilities of a learner, runner, and executor? Can we provide a short and exhaustive description of those?

originally posted by Christoph Groth (@cwg) at 2018-08-21T20:26:21.309Z on GitLab

Two additions to what I wrote above

While I'm sure that learners shouldn't store functions, I do not insist on what I wrote about balacing being better done in runners. As we saw in the chat, there are cases where wrapping learners in other learners can be useful. For example, there could be a learner that applies some more basic learner to a whole family of similar functions that are parametrized somehow.

Here's a symptom of the problems with current BalancingLearner. It is initialized with a sequence of learners. The .function attribute is initialized as partial(dispatch, [l.function for l in self.learners]). This is a callable object that calls a function that calls a user-provided function.

Now if the learner is made by from_product, the user provides a callable that implements the function family. That function is wrapped in partial so that the .function attributes of the child learners are deeply nested and opaque objects. The actual user-provided function is called only after three function calls that serve no purpose.

I propose to replace this application of BalancingLearner with a ComboLearner (bad name) that would approximate a family of functions parametrized by a "combo" of parameters. The combo learner would be initialized like BalancingLearner.from_product only that the function that it has to learn would be provided via the runner.

originally posted by Christoph Groth (@cwg) at 2018-08-21T20:49:20.684Z on GitLab

Anton wrote:

Who would be responsible for plotting the balanced learners?

Can today's BalancingLearner plot itself in the general case, when it's not made by from_product?

The learning of a parametrized function family seems to be best realized by a ComboLearner that works similarly as today's balanced learner but without the double wrapping of the user-provided function. It seems to me that the plotting should best remain there.

Who would be responsible for creating a collection of learners from combination of parameter values of a function?

The ComboLearner. It could even be called BalancingLearner but I think that another name would be clearer, perhaps ParametrizedLearner?

What are the responsibilities of a learner, runner, and executor? Can we provide a short and exhaustive description of those?

Learner: Knows how to efficiently approximate some class of mathematical functions in some given interval. It can be "asked" for points, and to "told" results. It provides functionality to query the learned data in useful ways: at the least it provides __call__ and __loss__ but might provide further functionality like integral, abserr and relerr for the integrating learner. It also knows how to plot itself. (Or alternatively provides the information needed to plot it.)

Runner: Knows how to evaluate some type (in the technical sense) of function in some context (locally, asynchronously, using an executor, over MPI, ...).

Executor: Runners that internally use an executor to evaluate the function can be provided one.

originally posted by Anton Akhmerov (@anton-akhmerov) at 2018-08-21T20:53:35.908Z on GitLab

Can today's BalancingLearner plot itself in the general case, when it's not made by from_product?

Yes, but it is less nice, it presents a bunch of plots.

The learning of a parametrized function family seems to be best realized by a ComboLearner that works similarly as today's balanced learner but without the double wrapping of the user-provided function. It seems to me that the plotting should best remain there.

Yes, this seems to be the main useful case for balancing.

originally posted by Bas Nijholt (@basnijholt) at 2018-09-24T11:38:55.719Z on GitLab

We've been having this discussion for quite a while and there are reasons for doing both.

Why to leave the `function` in the `Learner`

it's nice to keep the function with the learner, easier for the user (see example below)
the learner has documentation about what kind of function will work with it

has a nicer interface for the BalacingLearner:


learners = [Learner1D(partial(h, offset=random.uniform(-1, 1)), bounds=(-1, 1)) for i in range(10)]
learner = BalancingLearner(learners)
runner = Runner(learner)

vs

learners = [Learner1D(bounds=(-1, 1)) for i in range(10)] learner = BalancingLearner(learners)

Looping over functions again and adding the need to make the Runner aware of handling iterables of functions

runner = Runner(learner, [partial(h, offset=random.uniform(-1, 1)) for i in range(10)])


# Why to put the `function` in the `Runner`
* one could pickle the entire learner as it is just data
* the learner doesn't actually "need" to know about the function

### Why I think having the function with the learner is nicer

I often run simulations in notebooks where I define all functions in a module and then define the entire simulation in a cell by creating a learner.

For example (I leave out large parts of code, but leave in enough to get the picture across):

`cell 1`

```python
syst_pars = dict(root_dir='~/Work/quasi_majo_potential/side_gates/geo_/', a=7,
                 r_sc=80, coverage_angle=120, angle=-30, L_cut=-600, with_holes=True)

params = dict(alpha=20, mu_sc=100, g=10, B_y=0, B_z=0,
              Delta=0.42, **funcs.constants_InAs)

combos = dict(
    V_plunger=np.arange(-10, 1, 1),
    V_cutter=np.arange(-10, 1, 1),
    orbital=[True, False],
)

f = funcs.change_conductance_template(funcs.smallest_gap, ['B_x'], combos, syst_pars, params, [])
lkwargs = dict(bounds=[0, 4], loss_per_interval=funcs.abs_min_log_loss)
learner = adaptive_tools.BalancingLearner.from_product(
    f, learner_type=adaptive_tools.Learner1D, learner_kwargs=lkwargs, combos=combos)

folder = 'data/quasi-majoranas'
learner.load(folder)

cell 2

syst_pars = dict(root_dir='~/Work/quasi_majo_potential/side_gates/geo_/', a=5,
                 r_sc=80, coverage_angle=120, angle=-30, L_cut=-600, with_holes=True)

params = dict(alpha=20, mu_sc=100, g=10, B_x=0, B_y=0, B_z=0,
              V_cutter=0, **funcs.constants_InAs)

def lowest_energy(x, syst_pars, params):
    import funcs, common
    import numpy as np
    lead = funcs.make_wire_from_cad(**syst_pars).leads[1]
    params['V_plunger'], params['Delta'] = x
    ... # stuff here
    return np.abs(ev).min()

f = partial(lowest_energy, syst_pars=syst_pars, params=params)
learner = adaptive_tools.Learner2D(f, [(-10, 0), (0, 4)])
learner.load('data/gap_fit.pickle')

cell 3, 4, ...

# more simulations defined by learner(s)

Then finally I would have only one cell to connect to the cluster (I left out some code) and start the runner with just

client, dview, lview = hpc05.start_remote_and_connect(n=150, hostname='hpc05', profile='pbs')
runner = adaptive_tools.Runner(learner, goal=lambda l: False, executor=client)
save_task = runner.start_periodic_saver(save_kwargs=dict(folder=folder), interval=1800)
# more stuff with the runner here

tl;dr: I like to define the entire simulation with the learner and then the complete running part is generic.

I also did this in a few publications where someone would just run a cell to define the learner which would load the data. Then if one would like to recreate the data one simply doesn't load the data and goes runner = Runner(learner).

I am in favor of leaving the function in the learner, what do you guys think, @jbweston, @anton-akhmerov, @cwg?

originally posted by Bas Nijholt (@basnijholt) at 2018-09-24T11:46:29.854Z on GitLab

I also implemented this in https://gitlab.kwant-project.org/qt/adaptive/merge_requests/110.

It shows for example how even the runner.simple becomes quite complicated

def simple(learner, goal, function):
    if isinstance(function, collections.Iterable):
        _function = functools.partial(dispatch, [f for f in function])
    else:
        _function = function

    while not goal(learner):
        xs, _ = learner.ask(1)
        for x in xs:
            y = _function(x)
            learner.tell(x, y)

python-adaptive / adaptive