Open basnijholt opened 5 years ago
originally posted by Joseph Weston (@jbweston) at 2018-01-31T12:46:30.169Z on GitLab
One could then stop using an ABC for learners and other packages could contain compatible learners and runners without even explicitly depending on "adaptive"
But an ABC is such a specification. The problem of explicitly depending on adaptive is solved by us implementing __subclasshook__
for the Learner ABC.
This is IMO superior to merely documenting what the learner interface is.
- The learner defines the "done" condition
learner = SomeLearner(a, b, done_condition)
I am not sure that the learner is the correct place for this. Currently in adaptive the end condition is supplied to the runner; I believe that this is the correct place for it.
- The learner itself is an iterator (not an iterable) that chooses new points on request.
I'm not sure that this is a good idea. This means that the following:
for point in learner:
pass
changes the state of the learner. I understand that this generally true for iterators, but I think it is very confusing.
Also, in the above, the learner is the only one who can decide how many points to give back in each call to next()
(in the above
code it is not obvious that point
is a sequence, but I saw that later in your post you said that this could be the case). This would
seem a valid decision for cquad, where the learner naturally wants to give back all the points in an interval, but I'm not sure that it
is the best choice for other learners.
For example, imagine we have a 1D learner with just the boundary points evaluated and we want to
choose the next 2 points before giving back any data (so that the runner can get 2-fold parallelism).
If we ask the learner for both points at once (i.e. learner.choose_points(2)
) then it can sensibly
choose to divide the interval into 3 parts:
x--------x -> x--x--x--x
However, if we have to do 2 separate calls to learner.choose_points(1)
then the only thing the learner can do is
interval bisection:
x--------x -> x----x----x -> x--x--x----x
which leads to a suboptimal distribution of points, given the current information! In this example we get "better" results by allowing the runner to decide how many points it wants, but allowing the learner to decide how to distribute those points.
In the cquad example I could imagine that we would write the learner algorithm purely in terms of intervals, and there would be a "middleware" sitting between that and the runner that would work out the points->intervals mapping. The runner requests points from it, and it then requests intervals from the learner. Does that make sense?
- It has a method to feed back results. (The requested points are immutable objects that also serve as a handle when feeding back results.)
This already exists; it is called add_point
currently, but perhaps feed
is a better name (we were up till now not consistent about what
we call elements from the domain, and codomain of the learned function, and pairs of the same).
When called like function, the learner predicts values.
This is a nice addition! @anton-akhmerov reminded me that this does not always make sense, though, because not all learners are trying
to interpolate a function, e.g. AveragingLearner
.
originally posted by Christoph Groth (@cwg) at 2018-02-01T23:20:08.574Z on GitLab
One could then stop using an ABC for learners and other packages could contain compatible learners and runners without even explicitly depending on "adaptive"
But an ABC is such a specification. The problem of explicitly depending on adaptive is solved by us implementing
__subclasshook__
for the Learner ABC. This is IMO superior to merely documenting what the learner interface is.
My point is that simplifying the learner interface could turn it into an inofficial standard. Then other libraries could become interoperable with adaptive without either depending on the other one. The subclasshook mechanism only works in retrospective: adaptive has to explicitly accept specific external classes as valid learners.
Of course its OK to depend on adapative, but it's better to avoid dependencies if there is no good reason for them.
- The learner defines the "done" condition
learner = SomeLearner(a, b, done_condition)
I am not sure that the learner is the correct place for this. Currently in adaptive the end condition is supplied to the runner; I believe that this is the correct place for it.
But the runner has no understanding at all of the class of functions that is being learned. That's the learner's field of expertise. That's why "goals" will always depend on the specific learner type and as such (if generally useful) should be made best implemented as methods of it.
Of course, with the current design you can also write:
Runner(..., goal=Learner.some_goal)
but I don't see how it helps the runner to be able to trigger the goal check at will. That check could be better triggered by the learner because it knows best when and how often this is best done. For example in cquad the error estimate will only change when an interval gets updated, so there's no point in checking more often than that. The runner has not understanding of that.
As far as custom goals that have not been foreseen by the Learner author are concerned, there are multiple possible solutions but the most elegant seems to me to subclass a learner. This makes it explicit that the new goals is specific to a learner.
- The learner itself is an iterator (not an iterable) that chooses new points on request.
I'm not sure that this is a good idea. This means that the following:
for point in learner: pass
changes the state of the learner. I understand that this generally true for iterators, but I think it is very confusing.
You have a good point here.
When starting to think about learner's interface I was looking for some pythonic way to abstract a consumer-producer in Python. I.e. an object that provides things on request, and accepts things that are fed to it (like a file, only that a file works with streams of characters and not objects).
My first idea was to use a (generator-like) coroutine, that as you know
can not only yield
objects but can be also fed objects. The problem
here is that Python's syntax for using a generator in this way is not
nice for this kind of usage.
Perhaps you have a better idea? In C++ I would write something like:
while (points = learner.get()) {
values = evaluate(points);
learner.feed(points, values)
}
The nice thing here is that getting no points is also the exit condition for the loop. If only Python's assignments were expressions... But they aren't, so one would have to write:
while True:
points = learner.get()
if not points:
break
values = evaluate(points)
learner.feed(points, values)
The iterator trick allows to simplify this to
for points in learner:
values = evaluate(points)
learner.feed(points, values)
The C++ version is indeed nicer because get()
is explicit. But I
consider the Python version OK, at least it's better than any
alternative I can think of.
For example, imagine we have a 1D learner with just the boundary points evaluated and we want to choose the next 2 points before giving back any data (so that the runner can get 2-fold parallelism). If we ask the learner for both points at once (i.e.
learner.choose_points(2)
) then it can sensibly choose to divide the interval into 3 parts:x--------x -> x--x--x--x
(...)
The problem with this approach is that it's nondeterministic. The learned function will depend on the history of available resources. I think it is a problem if a program that fails for some input cannot be rerun to reproduce the bug.
But your point is valid. One can imagine cases where it is useful for
the learner to know how many evaluations can be done in parallel right
now. In the C++ interface sketched above I would add a parameter to
get
that defaults to 1 and specifies a "soft" maximum number of
points. The learner would try to provide a number of points that is as
close to but not higher than that maximum. But it would still work in
full "chunks", i.e. the first request from cquad would yield 33 points.
How about this:
while learner.unsatisfied():
points = learner.get(n)
values = evaluate(points)
learner.feed(points, values)
Unfortunately, this is getting much more complicated than my original proposal. I have the impression that the simple original interface would work well in practice.
In the cquad example I could imagine that we would write the learner algorithm purely in terms of intervals, and there would be a "middleware" sitting between that and the runner that would work out the points->intervals mapping. The runner requests points from it, and it then requests intervals from the learner. Does that make sense?
Yes, but what's your goal here?
When called like function, the learner predicts values.
This is a nice addition! @anton-akhmerov reminded me that this does not always make sense, though, because not all learners are trying to interpolate a function, e.g.
AveragingLearner
.
I think that it's a nice and crips definition to say that learners learn
a function y=f(x)
. The AveragingLearner
does not do that. Its x
is some pseudo-random seed that the averaging learner actually does not
seem to need -- it never requests points with the same seed twice.
I'm not saying that the AveragingLearner
is not useful. But I think
that it should be implemented as a learner of a noisy function of no
variable, i.e. x
would be always None
or an empty sequence. Hence,
to get the estimate of the average one would evaluate learner(None)
.
IMHO this corresponds better to what AveragingLearner
does.
There could be of course also an averaging learner of function of a variable, e.g. the noisy measurement of a current given some gate voltage.
originally posted by Anton Akhmerov (@anton-akhmerov) at 2018-02-02T08:37:29.136Z on GitLab
I'm not saying that the
AveragingLearner
is not useful. But I think that it should be implemented as a learner of a noisy function of no variable, i.e.x
would be alwaysNone
or an empty sequence. Hence, to get the estimate of the average one would evaluatelearner(None)
. IMHO this corresponds better to whatAveragingLearner
does.
Good point, it sounds very reasonable that learner()
should return the average of the function that is learned (and probably have a more detailed interface for more detailed information).
How to deal with the need to keep track of the collection of prng seeds? What should keep track of them?
originally posted by Joseph Weston (@jbweston) at 2018-02-02T09:42:13.080Z on GitLab
The subclasshook mechanism only works in retrospective: adaptive has to explicitly accept specific external classes as valid learners
This claim is demonstrably false:
import abc
class Animal(abc.ABC):
@classmethod
def __subclasshook__(cls, obj):
return any(hasattr(parent, 'make_noise') for parent in obj.__mro__)
@abc.abstractmethod
def make_noise(self):
pass
class Duck:
def make_noise(self):
return "quack"
duck = Duck()
print(isinstance(duck, Animal))
Also the Python docs say so.
I maintain that using an ABC is as good a documentation as any for describing our "learner protocol".
The problem with this approach is that it's nondeterministic. The learned function will depend on the history of available resources.
I agree that it will be not be deterministic from the user's perspective, however I do not perceive this as a problem. A given run produces a given set of calls to the learner's API (e.g: get(3), feed(...), get(2), feed(...), ...), and if you create a new learner and repeat the exact same set of calls, then the learners will be in the same state.
I think it is a problem if a program that fails for some input cannot be rerun to reproduce the bug.
At the moment we get around this by having the option of maintaining a log of the calls made to the learner. While this may not be practical for every run (you essentially need to store all the data twice) it has so far been a reasonable compromise between reproducibility and usefulness.
But the runner has no understanding at all of the class of functions that is being learned.
Yes, but the user does. The user is the one who provides the goal. Here are some examples of reasonable goals that we have already used in adaptive:
I believe that (1) is the type of goal that you were considering up till now. At the moment the "loss" is just an arbitrary figure of merit that cannot really be compared between learners (not necessarily between learners of the same class), however I believe that it would make sense to "standardize" this somehow (e.g. make it so that a loss of 1 means that the learner's "internal goal" has been reached). However I believe that it should totally be possible to keep requesting points, even after the learner's "internal goal" has been reached. The user is king, and if they decide that they want more points then they should get more points dammit!
(2) is also a very common goal (we often use it to compare a learner's performance to a naive gridding of parameter space). I think you will agree
that supplying lambda learner: learner.n > 1000
is more lightweight than subclassing the learner and overriding a goal
method.
(3) is clearly outside the remit of the learner, and there is not really a natural place to "start the timer" if the logic is inside the learner. We could imagine a learner that takes the execution time of different points into account, but this is, IMO, a separate consideration.
originally posted by Joseph Weston (@jbweston) at 2018-02-02T10:00:45.189Z on GitLab
In the cquad example I could imagine that we would write the learner algorithm purely in terms of intervals, and there would be a "middleware" sitting between that and the runner that would work out the points->intervals mapping. The runner requests points from it, and it then requests intervals from the learner. Does that make sense?
Yes, but what's your goal here?
That the learner can return objects that are not mapped 1-1 onto "points" that can be evaluated by the runner.
The learner's internal algorithm is not necessarily easily expressed in terms of points, but in terms of other objects such as intervals. OTOH a runner needs to get points so that it can call the function and get values.
originally posted by Christoph Groth (@cwg) at 2018-04-19T11:43:57.486Z on GitLab
I only knew two ways of becoming the instance of an ABC: deriving from it and using register()
. Thanks for teaching me the third way: __subclasshook__
. I hope that I know all of them now.
However, I have to say that __subclasshook__
feels like a hack that defies the whole idea of ABCs, namely of having an explicit statement of compliance that goes beyond duck typing. With the first two methods of declaring a class to adhere to an ABC there's someone who inspects the class in question and then makes the statement that it adheres to the protocol specified by the ABC.
Using __subclasshook__
is an approach that reduces ABCs to duck typing, so what's the point other than providing a way to migrate an existing library from duck typing to ABCs without breaking compatibility?
But OK, since my proposal is to use duck typing for learners, an ABC with __subclasshook__
works just as well.
originally posted by Christoph Groth (@cwg) at 2018-04-19T12:03:19.695Z on GitLab
Joseph wrote:
I agree that it will be not be deterministic from the user's perspective, however I do not perceive this as a problem. A given run produces a given set of calls to the learner's API (e.g: get(3), feed(...), get(2), feed(...), ...), and if you create a new learner and repeat the exact same set of calls, then the learners will be in the same state.
I think that this is too weak. That means that when you do computations on a cluster and start getting weird results the problem will be extremely difficult to debug. If you're lucky, you can reduce the size of the problem and run it on a single CPU, but what to do if this is not possible?
But even if there are no bugs being non-deterministic is highly problematic. So you run a simulation and it gives you different results each time even when starting from exactly the same state? That's like working with an analog computer!
There are certainly problems where non-deterministic concurrent algorithms may be the only way to go, but I do not have the impression that learners belong to that category. Let's not open the can of worms of non-determinism without need!
That brings me to the following. Further up I wrote:
One can imagine cases where it is useful for the learner to know how many evaluations can be done in parallel right now. In the C++ interface sketched above I would add a parameter to get that defaults to 1 and specifies a "soft" maximum number of points. The learner would try to provide a number of points that is as close to but not higher than that maximum. But it would still work in full "chunks", i.e. the first request from cquad would yield 33 points.
I think now that I was wrong there. There's no need for any parameter to get()
. it can be useful to provide some typical chunk size or similar hint upfront, but If determinism is to be preserved, any hint that is provided during evaluation cannot have any influence on how the learner continues to work.
So sure, one could provide the number of currently free cores to get()
but this could serve only as a mere optimization to avoid calling get()
multiple times. That seems unnecessary, since an appropriate chunk size should be selected before learning starts.
originally posted by Christoph Groth (@cwg) at 2018-04-19T14:50:11.555Z on GitLab
Joseph, concerning the 4 types of goals that you raised, I think that they are perfectly compatible with the iterator API. (Although I consider usage type 2 only useful for benchmarking learners.)
A reasonable default "loss" in that case would be zero, i.e. continue forever. Then case 1 would be handled by specifying a different precision goal, and cases 2 to 4 would be handled by breaking out of the loop when the appropriate condition is met.
You say that you find it problematic that
for point in learner:
pass
changes the state of the learner. I guess that you would prefer instead
while point := learner.get():
pass
if only that was valid Python.
Now it happens to be that PEP 572 (currently in the works) proposes to make the above valid Python and one strong argument against it is that assignment expressions are not really necessary in the above case because iterators already provide a way to do the same. So if you can convince me why the second syntax is actually better than the first one, I'd be actually happy because that would give me an argument in support of PEP 572!
The second syntax allows values to be fed-back to the "iterator", but I actually can't think of that being very useful in general. (As I detailed in a comment above, IMO it's not useful in the case of learners.)
originally posted by Joseph Weston (@jbweston) at 2018-05-23T14:09:11.898Z on GitLab
scikit-optimize
has a similar architecture to ours (though their Learner is narrower in scope). They have methods ask
and tell
, which I think are nice and succinct.
(original issue on GitLab)
opened by Christoph Groth (@cwg) at 2018-01-20T10:03:09.612Z
It seems to me that "adaptive" could greatly profit by becoming trivially extensible.
The adaptive package consists of: a collection of learners and a collection of runners. I think that "adaptive" should not strive to contain all learners and runners that are useful to someone. Rather, it seems to me that it would create the most value by establishing a simple notion of a "learner" and providing a set of generally useful learners as well as generally useful runners.
One can easily imagine learners and runners that are out-of-scope of a single package. Just to name a few that I have in mind:
Even if one believes that the above should be all included into "adaptive", my point is that adaptive evaluation is such a general idea that it is beyond the scope of a single package. This is why I suggest specifying a simple and "natural" basic runner-learner interface, such that it would become trivial to write a simple runner. One could then stop using an ABC for learners and other packages could contain compatible learners and runners without even explicitly depending on "adaptive", but "adaptive" would be useful together with them.
If the learner interface is simple enough, third-party packages will hopefully adapt it instead of using callback functions.
What is a learner? It seems to me, that a learner is best seen as something that, in accordance with some internal model of a function, can intelligently query points of a function and, based on partial or complete results of the queries, predict that function. Thus, the the core business of a learner consists of
In today's "adaptive", in addition to the above, learners have to do considerable bookkeeping (because results may be fed back arbitrarily while the learner might need specific groups of them).
What would be a natural, almost invisible, interface to the above functionality? How about the following? (From the point of view of a trivial runner):
The above pretty minimal interface to a learner consists of the following parts:
In addition to this strict minimum, there could be some optional functionality, like estimating the intermediate quality of the prediction.
Compared to my earlier attempts at adaptive evaluation I have been convinced by you guys that:
However, I still think that it is useful if learners are able to provide points in bunches and expect them back in the same bunches. Moreover, it this implicitly establishes a common priority of all the points of a bunch. At the same time, as subsequent bunches are requested without feeding back results, these have a lower and lower relative priority.
A useful corollary of this is that it is now naturally possible to drive a learner with the least possible amount of speculation but still utilizing parallelism. Consider the situation where there are only a few available cores per active learner. We want to utilize some parallelism, but without becoming too speculative, because this will increase the overall workload and ultimately slow down the whole process. With the above protocol, this means that the runner should only request one bunch of points at a time for each learner, and only request the next bunch when the result has been fed back. In the common case where this provides enough work to occupy all available cores, it is the most efficient approach.
Note that there is no way to do the same with current "adaptive", because it doesn't have the notion of, even implicit, priority. The best one can do if there are a few active learners is requesting one point at a time in a round-robin fashion. This is inefficient not only because of the lack of vectorization, but also because learner A might need only one point right now without speculating, but learner B might need three.
So it seems to me that points should arrive in bunches (i.e. a Python sequence), and should be fed back in the same bunches. In practice, one can assume for many learners and runners that points are sequences (of sequences ...) of numbers, i.e. numerical arrays. (For safety, it's useful if these arrays are immutable.)
The goal of a learner is to predict a function. What is the most natural interface to a predicted function? That of a function call:
learner(points)
.To summarize, I propose to define learners as iterators that yield sequences-of-points to be evaluated. The results are fed back using a
feed
-method. The learner's current prediction can be queried by a simple function call. I believe that this minimal interface encapsulates all the essential functionality of a learner for a wide spectrum of applications. It is sufficient for all kinds of synchronous and asynchronous runners as well as just-in-time function plotters, etc.