Open basnijholt opened 6 years ago
Adaptive looks super cool! And yes fills one more step of automation that xyzpy
still lacks -- actually choosing which set of combos to run... This occurred to me at one point but I thought it was too involved so glad to see you are doing it! The dream of setting and forgetting a computer to intelligently harvest labelled data for you approaches closer.
A few questions (sorry if these are basic, I haven't had time to properly look through adaptive
):
A final thought is that the completely dynamic nature of the coordinates might become inefficient for the 'gridded' nature of xarray for many dimensions. More suitable for a sparse/table representation maybe - or I guess the starting point for interpolation. Have you had ideas in this direction?
J
By the way, your field looks like it might be quite close to mine (see my other package quimb) thus the similar ideas maybe!
How does the learning work? Does it need e.g. scalar/smooth/float output?
Depends on the learner's algorithm. Right now we mostly have sampling that prioritizes discontinuities in the data, but this could be controlled. I also imagine developing of specialized algorithms. For example we're now working on such an algorithm for preferential band structure sampling.
Can it be batched (i.e. submit sets of points at once)?
Yes, if the learner supports it, and currently all the algos we implemented do that.
And is there way to generalize to the n-D case?
Yes, definitely, although in higher dimensions (>3) local sampling will become bad because of the curse of dimensionality. There we'd need to think of alternative approaches.
A final thought is that the completely dynamic nature of the coordinates might become inefficient for the 'gridded' nature of xarray for many dimensions. More suitable for a sparse/table representation maybe - or I guess the starting point for interpolation. Have you had ideas in this direction?
I cannot think of anything better than storing the interpolation object in that case.
Nice, thanks for those answers. Don't know if I will get round to this any time soon, but from my perspective a syntax like this might be cool:
combos = {
'A': [1, 2, 3],
'B': ['foo', 'bar],
't': Adaptive(bounds=(-1, 2), loss=0.05, ...)
}
h.harvest_combos(combos)
or for the 2D case:
combos = {
'A': [1, 2, 3],
'B': ['foo', 'bar],
('t', 'x'): Adaptive(bounds=[(-1, 1), (-1, 1)], loss=0.05, ...)
}
h.harvest_combos(combos)
Though each set of adaptive results would have to be aligned/interpolated to go into the full dataset.
In the other direction, xyzpy
has support (not so robustly tested) for 'case running': i.e.
h.harvest_cases([{'A': 1, 'B': 'foo', 't': 0.243}, {'A': 2, 'B': 'bar', 't': 0.675}, ...])
# or if you have set xyz.Runner(..., fn_args=('A', 'B', 't'), ...)
h.harvest_cases([(1, 'foo', 0.243), (2, 'bar', 0.675), ...])
which would be the natural way for adaptive
to call it currently, maybe with a hook to get the result back without extracting it from the dataset.
One more random snippet! This is one way to turn a 2D learner's data into an xarray.Dataset
:
import xyzpy as xyz
from xyzpy.gen.case_runner import _cases_to_ds
fn_args = ['x', 'y']
out_name = 'out'
_cases_to_ds(
results=tuple(learner.data.values()),
fn_args=fn_args,
cases=tuple((learner.data.keys())),
var_names=(out_name,),
var_coords={},
var_dims={out_name: []}
)
which for the first 2D example in the adaptive notebook produces:
<xarray.Dataset>
Dimensions: (x: 886, y: 879)
Coordinates:
* x (x) float64 -1.0 -0.9698 -0.9643 -0.9159 -0.9122 -0.9095 ...
* y (y) float64 -1.0 -0.9506 -0.9193 -0.9175 -0.9168 -0.915 -0.9134 ...
Data variables:
out (x, y) float64 -1.0 nan nan nan nan nan nan nan nan nan nan nan ...
It's pretty inefficient though, due to the non-gridded problem.
That would be pretty cool indeed :)
Thanks for the suggestions on how to save the data, I've been thinking about a good way of saving (and restoring) the learners a bit lately.
So far I've just been experimenting with just pickling the data, which seems to work just fine, but I would prefer a more general data format. I'll experiment with xarray
a bit more (although I am pretty busy myself as well.)
I am impressed with this package!
In my field we very often do these loops over multiple dimensions and generate many curves for different dimensions.
We (me and my colleagues) tried to tackle a very similar problem that
xyzpy
is trying to solve. We wroteadaptive
that does things similar toxyzpy
, the biggest difference is that it can adaptively sample one (or two) of the dimensions.As an example I adapted your Basic Output Example to do the same but with
adaptive
:which creates "learners", which are essentially objects from which you can request new points and tell new points to.
then you "learn" the function by creating a
Runner
(this doesn't block the kernel and runs on all the cores, optionally you provide aexcecutor
to run it on a cluster)Then plot the data with:
As you can see, it is not nearly as short are your code and neither do we provide the functionality to save the data. Also the interface we have is not really optimized easily generate the
combos
, but this is where we can learn fromxyzpy
. On the other hand I think there is probably something usefull for you inadaptive
too.(P.S. this is not really an "issue", but more of a place to hopefully exchange some ideas)
EDIT Inspired on your work, I've created this PR, after which one can just do: