Closed syclik closed 2 years ago
From @betanalpha on August 16, 2013 14:40
Whoops, forgot about the imaginary components, so I guess something like
void fft(vector x, vector r, vector i);
@betanalpha, think we should still have this as part of the language?
From @betanalpha on May 22, 2014 20:47
There are lots of physics use cases, if only for preprocessing in transformed data. The holdup was whether we wanted to (a) find a way to limit functions to transformed data or (b) figure out the analytic gradient for the FFT.
On May 22, 2014, at 9:44 PM, Daniel Lee notifications@github.com wrote:
@betanalpha, think we should still have this as part of the language?
— Reply to this email directly or view it on GitHub.
From @bob-carpenter on May 23, 2014 13:35
And a bigger issue is data-only arguments. Because in some cases, we might only want to implement some gradients for a function.
Definitely not impossible, but a fair bit of work in the parser.
On May 22, 2014, at 10:47 PM, Michael Betancourt notifications@github.com wrote:
There are lots of physics use cases, if only for preprocessing in transformed data. The holdup was whether we wanted to (a) find a way to limit functions to transformed data or (b) figure out the analytic gradient for the FFT.
On May 22, 2014, at 9:44 PM, Daniel Lee notifications@github.com wrote:
@betanalpha, think we should still have this as part of the language?
— Reply to this email directly or view it on GitHub.
— Reply to this email directly or view it on GitHub.
From @mbrubake on May 23, 2014 15:35
So I totally think we should add FFT at some point as it's very useful for some models. The gradients are easy (the FFT and Inverse FFT are both linear operators) so we don't need to worry about data-only functions.
The problem with adding FFT is that we need some way to handle complex numbers. There are basically two options that aren't complete hacks:
Both of these options could potentially turn into a fair amount of work. That said, I vote for 2. For one, I don't think complex numbers are going to be widely enough used to be worth the pain of implementing and supporting them. On the other hand, allowing functions to return multiple values seems much more useful and may not be that hard if we make suitable use of boost tuples [http://www.boost.org/doc/libs/1_55_0/libs/tuple/doc/tuple_users_guide.html]
From @bob-carpenter on May 23, 2014 16:51
Yes, I really want to add multiple outputs. What they call lists in the R world, I think, though of course, I'd want to type the ones in Stan.
I don't know what the size limits are on Boost's tuples, but we probably don't need huge lists.
I also really like the Python syntax for literals and assignments, which lets you do things like (a,b,c) to create a list out of a, b, and c. It's all kosher for us, because we can infer the type.
vector a; vector b; (a,b) <- fft(...);
We probably want to have the right co(ntra)variance with the type, so that they follow assignment rules pointwise. That is, I can use an (int,real) argument anywhere I can use a (real,real) agument and I can use an (int,real) result anwyhere I could use a (real,real) result.
On May 23, 2014, at 5:35 PM, Marcus Brubaker notifications@github.com wrote:
So I totally think we should add FFT at some point as it's very useful for some models. The gradients are easy (the FFT and Inverse FFT are both linear operators) so we don't need to worry about data-only functions.
The problem with adding FFT is that we need some way to handle complex numbers. There are basically two options that aren't complete hacks:
• Actually implement a complex type which works just like real does. • Add support for functions which return more than one parameter. Then, the FFT would simply take and return the real and imaginary coefficients separately. Both of these options could potentially turn into a fair amount of work. That said, I vote for 2. For one, I don't think complex numbers are going to be widely enough used to be worth the pain of implementing and supporting them. On the other hand, allowing functions to return multiple values seems much more useful and may not be that hard if we make suitable use of boost tuples [http://www.boost.org/doc/libs/1_55_0/libs/tuple/doc/tuple_users_guide.html]
— Reply to this email directly or view it on GitHub.
From @mbrubake on May 23, 2014 20:8
I'm also a pretty big fan of the python syntax in this case, so that sounds perfect to me. I'm pretty sure boost::tuple can handle as many options as we will ever reasonably need. I'm hard pressed to imagine a function returning more than 5 things.
One challenge we might run into is if we need to have different functions for different numbers of returns. E.G., a function which does one thing if it returns one versus two objects. Because we can't deduce template parameters based on return type in C++, how these functions return values would need to be different from a simple return value. Probably not hugely important to begin with but maybe worth thinking about.
From @betanalpha on May 24, 2014 16:53
And a bigger issue is data-only arguments. Because in some cases, we might only want to implement some gradients for a function.
Definitely not impossible, but a fair bit of work in the parser.
I was thinking of something simpler — functions that can only be called in the transformed data block.
From @bob-carpenter on May 24, 2014 23:16
On May 23, 2014, at 10:08 PM, Marcus Brubaker notifications@github.com wrote:
I'm also a pretty big fan of the python syntax in this case, so that sounds perfect to me. I'm pretty sure boost::tuple can handle as many options as we will ever reasonably need. I'm hard pressed to imagine a function returning more than 5 things.
No, but I can easily imagine an R programmer stuffing 45 things into a list.
One challenge we might run into is if we need to have different functions for different numbers of returns. E.G., a function which does one thing if it returns one versus two objects. Because we can't deduce template parameters based on return type in C++, how these functions return values would need to be different from a simple return value. Probably not hugely important to begin with but maybe worth thinking about.
We do the same thing as C++ in not allowing two different
functions with the same inputs and different return types.
I don't think that's usually a problem though --- when it's
happened to me it's always been a mistake.
From @bob-carpenter on May 24, 2014 23:17
On May 24, 2014, at 6:53 PM, Michael Betancourt notifications@github.com wrote:
And a bigger issue is data-only arguments. Because in some cases, we might only want to implement some gradients for a function.
Definitely not impossible, but a fair bit of work in the parser.
I was thinking of something simpler — functions that can only be called in the transformed data block.
That would definitely be easier to implement.
From @mbrubake on May 24, 2014 23:26
On Sat, May 24, 2014 at 7:17 PM, Bob Carpenter notifications@github.comwrote:
On May 24, 2014, at 6:53 PM, Michael Betancourt notifications@github.com wrote:
And a bigger issue is data-only arguments. Because in some cases, we might only want to implement some gradients for a function.
Definitely not impossible, but a fair bit of work in the parser.
I was thinking of something simpler — functions that can only be called in the transformed data block.
That would definitely be easier to implement.
I don't understand why we need to though, at least for FFT. The analytic gradients are pretty easy. Heck, I'll volunteer to do them if someone else goes through the trouble of implementing everything else but is having trouble with that.
Cheers, Marcus
From @bob-carpenter on May 24, 2014 23:53
It sounds like we don't need to do this for FFT.
But there are things like cumulative distribution functions and some crazy densities that are easy to differentiate for a few but not all parameters.
But let's cross that bridge when we come to it.
On May 25, 2014, at 1:26 AM, Marcus Brubaker notifications@github.com wrote:
On Sat, May 24, 2014 at 7:17 PM, Bob Carpenter notifications@github.comwrote:
On May 24, 2014, at 6:53 PM, Michael Betancourt notifications@github.com wrote:
And a bigger issue is data-only arguments. Because in some cases, we might only want to implement some gradients for a function.
Definitely not impossible, but a fair bit of work in the parser.
I was thinking of something simpler — functions that can only be called in the transformed data block.
That would definitely be easier to implement.
I don't understand why we need to though, at least for FFT. The analytic gradients are pretty easy. Heck, I'll volunteer to do them if someone else goes through the trouble of implementing everything else but is having trouble with that.
Cheers, Marcus — Reply to this email directly or view it on GitHub.
As another physical science user, I'd also be very interested in fft
/ ifft
.
We can worry about the language details after it gets into the math library first. This should be an easy enough project for someone wanting to contribute to the math library.
Is there a pointer to the analytic gradient definitions somewhere?
The analytic gradients are the easy part of this.
d fft(y) = fft( dy )
The hard part is dealing with complex values and/or multiple return values.
Regarding the analytic gradients, we should be a little bit careful here. the FFT is certainly linear when its target is complex numbers, but the amplitudes and phases are not at all linear in the input. The analytic gradient for amplitude squared certainly isn't bad, but the analytic gradient for phase is slightly annoying to write down without being able to manipulate complex numbers. I'd like to vote for doing this with a complex type.
Will the complex type only need double real and imaginary components?
Does Eigen have the complex support for the operations you need? It's not complete, but it's getting better.
If both are yes, then I don't see a problem in terms of how all the autodiff works. I have no idea about efficiency, accuracy and robustness.
I've been told before that when I grow up I'll learn to think of matrix operations in terms of complex numbers.
On Oct 12, 2016, at 4:16 PM, thelseraphim notifications@github.com wrote:
Regarding the analytic gradients, we should be a little bit careful here. the FFT is certainly linear when its target is complex numbers, but the amplitudes and phases are not at all linear in the input. The analytic gradient for amplitude squared certainly isn't bad, but the analytic gradient for phase is slightly annoying to write down without being able to manipulate complex numbers. I'd like to vote for doing this with a complex type.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Planning to bring this up in the meeting tomorrow but had a chat with @bob-carpenter and @betanalpha and we settled on implementing a family of fft functions without creating a complex type. fft_amp2
will give the squared amplitudes, fft_phase
will give the phases, and an as-yet unnamed function will return the amplitudes and phases packaged as a 2xn Eigen matrix. We'll probably also give the real and imaginary components and corresponding 2xn matrix for convenience of computing the inverse fourier transform in the future.
@thelseraphim Could you write out the Stan language signatures for the functions you're proposing just to make sure we're on the same page before you build them? Thanks.
From @wds15 on the stan-dev list:
One of the very cool applications of a FFT transform (and its inverse) is the possibility to calculate the convolution as a simple product of two functions in the transformed space. This is for example handy if one wants to get the difference distribution of two quantities; one simply FFT transforms the function, multiplies them and then takes the inverse of that product.
@wds15: This is one of the reasons I reopened the issue---other people were asking for the same kind of convolutions.
Just wanted to record the result of conversations offline and on discourse in the thread here. The plan initially is to implement the following function signatures, as @betanalpha spelled out on discourse
real[] fft_amps(real[] x);
real[] fft_phases(real[] x);
real[,] fft_amps_phases(real[] x);
we'll also want the matrix versions of these functions
vector fft_amps(vector x);
vector fft_phases(vector x);
matrix fft_amps_phases(vector x);
adding real[,] fft_real_imag(real[] x)
and its matrix version should be only infinitesimally more work and is much friendlier to future messing around with convolutions, ifft, etc. So that may or may not go in on the first pass.
I think we can get away with just the vector version. We generally require arguments to multivariate functions to be vectors in Stan, though sometimes we allow arrays and row vectors, too.
It's easier to code review in smaller chunks, so if you just do these two:
vector fft_amps(vector x); vector fft_phases(vector x);
that'd be a great start.
For the compound result, if it's two vectors you get back conceptually, the signature should specify the return as an array of vectors.
vector[] fft_amps_phases(vector x);
It'll make it more efficient to pull the individual vectors out and it's conceptually how I want to encourage people to think of collections of things. BUT, if the result is something that is going to go into matrix operations, then by all means, it needs to be a matrix.
On Feb 1, 2017, at 8:15 PM, thelseraphim notifications@github.com wrote:
Just wanted to record the result of conversations offline and on discourse in the thread here. The plan initially is to implement the following function signatures, as @betanalpha spelled out on discourse
real[] fft_amps(real[] x); real[] fft_phases(real[] x); real[,] fft_amps_phases(real[] x);
we'll also want the matrix versions of these functions
vector fft_amps(vector x); vector fft_phases(vector x); matrix fft_amps_phases(vector x);
adding real[,] fft_real_imag(real[] x) and its matrix version should be only infinitesimally more work and is much friendlier to future messing around with convolutions, ifft, etc. So that may or may not go in on the first pass.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Feel free to ignore me since I'm not active at the moment, but I figured I'd thrown in my $0.02.
matrix[] fft2(matrix x);
function.By all means pipe up! We appreciate the input.
I don't understand all this complex transformation stuff, so I think Thel and Michael should work that out. I don't even understand the use cases.
On Feb 2, 2017, at 5:23 PM, Marcus Brubaker notifications@github.com wrote:
Feel free to ignore me since I'm not active at the moment, but I figured I'd thrown in my $0.02.
• The choice of amplitudes and phases seems like a bad idea. In particular, it introduces a significant non-linearity (with a singularity no less) into a transformation which is otherwise linear. Doing it in terms of real and imaginary components retains this linearity, would generally be faster and is much more standard. It is then easy to convert to amplitudes and phases if you need them with calls to basic functions. • Having just a single function which returns an array of vectors/matrices is definitely better than separate functions and I would question whether it's worth actually having the separate functions. • As for vector[] vs matrix, I think vector[] makes a little more sense. Plus, it would make it possible to consistently extend to a 2D fft. I.E., to have a matrix[] fft2(matrix x); function. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
As another interested user, I agree with @mbrubake that real and imaginary parts are what would be useful. Here is a brief example of how I would use it:
My measured experimental data points are vector[N] y
. The measured y
is a discrete convolution of a known response function vector[H] h
and a sample variable vector[N+H-1] x
.
I have a model for x
which I fit using the data y
. To do this, I need to compute the discrete convolution for the modeled x:
for (i in 1:N) {
# Discrete convolution, using only the region where kernel
# and model overlap fully (convolve mode='valid' in scipy or Matlab)
y_modeled[i] <- dot_product(h, segment(x_modeled, i, H));
}
y ~ normal(y_modeled, sigma);
This is very slow if H is large. The discrete convolution can be calculated much faster by taking the fft of x_model
and h
, multiplying the Fourier transforms, and inverse transforming.
With N = 10,000 and H=1,000, I get a factor of 3-5 speedup using the FFT to compute the convolution (timings from scipy
).
My model code if stan implemented fft
and ifft
would look like:
x_fft <- fft(x_model); # Defined to return an array of 2 vectors
x_re <- x_fft[1];
x_im <- x_fft[2];
y_re <- h_re .* x_re .- h_im .* x_im; # Assuming h_im, h_re passed in as data
y_im <- h_re .* x_im .+ h_im .* x_re;
y_model <- ifft(y_re, y_im);
y ~ normal(y_model, sigma);
Thanks. That's really helpful.
On Feb 3, 2017, at 11:01 AM, Ryan Dwyer notifications@github.com wrote:
As another interested user, I agree with @mbrubake that real and imaginary parts are what would be useful. Here is a brief example of how I would use it:
My measured experimental data points are vector[N] y. The measured y is a discrete convolution of a known response function vector[H] h and a sample variable vector[N+H-1] x. I have a model for x which I fit using the data y. To do this, I need to compute the discrete convolution for the modeled x:
for (i in 1: N) {
Discrete convolution, using only the region where kernel
and model overlap fully (convolve mode='valid' in scipy or Matlab)
y_modeled[i]
<- dot_product(h, segment (x_modeled, i, H)); } y ~ normal(y_modeled, sigma); This is very slow if H is large. The discrete convolution can be calculated much faster by taking the fft of x_model and h, multiplying the Fourier transforms, and inverse transforming. With N = 10,000 and H=1,000, I get a factor of 3-5 speedup using the FFT to compute the convolution (timings from scipy). My model code if stan implemented fft and ifft would look like:
x_fft <- fft(x_model); # Defined to return an array of 2 vectors
x_re <- x_fft[1 ]; x_im <- x_fft[2 ]; y_re <- h_re . x_re .- h_im . x_im; # Assuming h_im, h_re passed in as data
y_im <- h_re . x_im .+ h_im . x_re; y_model <- ifft(y_re, y_im);
y ~ normal(y_model, sigma); — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
FFT is how we compute gravity forces, with which we can build a gravity solver model with stan. We have been exploring this using some customized tools for a while, but would like to explore porting the heavy lifting to stan.
Gradients of complex numbers has been worked out. Some explanations were done by the autograd authors at https://github.com/HIPS/autograd/issues/179 . real-complex transformations are more useful in our cases -- it is slightly more complicated due to hermitian constraints, but not that much difficult.
@rainwoodman We're working on adding some basic FFT functionality. Can you be more specific about what function you'd like to see in Stan?
I have to be honest I have not built anything with stan yet -- just got very inspired by chapter 19 of the manual, because our model is a PDE solver.
FFT (real-complex and complex real) is half of what we need. We also need a few 'convolution' operators, which I named them paint/readout.
Here is a write up of the operators:
https://bids.berkeley.edu/news/automatic-differentiation-and-cosmology-simulation
Jump to "Two Useful Operators in Particle-Mesh Solvers"
'Psi' operators (vector-jacobian-product in autograd) --are useful for back-propagation of the gradients. I was using the naive gradient convention for complex number derivatives -- it makes the real-complex gradients easier to write but it is mathematically uglier than the correct Wirtinger derivatives. If stan has never dealt with complex gradients then it is important to make a decision.
Once the FFT lands we can worry about the paint/readout operators. They are currently implemented in C and packed in a Python package.[1] It wouldn't be hard to lift them out to stan, I imagine. I can contribute a plugin if you point to me where to look.
Eventually we will worry about parallel -- we deal with millions of parameters and want to push to billions. Our current tools uses MPI since it is not fault tolerant and thus easier to program(as least for us with HPC background). Is there a desire along this line with the stan development?
[1] https://github.com/rainwoodman/pmesh/blob/master/pmesh/_window.pyx
For a project of this scale you definitely don’t want to be implementing the solver directly in Stan, and you definitely don’t want to be naively autodiff’ing through the solver itself, which will be slower and require a gigantic amount of memory.
What you want to do is autodiff the PDE system itself, yielding a larger system of PDEs whose solutions yield not only the states but also the sensitivities. This larger system can then be solved numerically, with care to ensure that the mesh or whichever numerical approach you’re taking is sufficiently adapted to the new equations. One of the reasons why we haven’t try to tackle PDEs in Stan is that, unlike with ODEs, there’s no sufficiently generic solver for PDEs and the extended system will likely require some hand-holding to adequately tune. But we can certainly help with the design of such an implementation.
This approach is not only faster, more stable, and less memory intensive, it also allows everything to be implemented in C++. Indeed this is how we solve ODEs behind the scenes in Stan. Moreover, this approach has proven empirically successful for models whose likelihoods include large scale (O(10^7)) N-body solves. That said, you’ll want to scale back to the problem to make sure that you can adequately quantify uncertainty on smaller problems first before pushing the size of the simulation (I’m holding out hope that the interest in using Stan is partially due to being able to do accurate Bayes, and not just the using the math library to compute gradients to plug into a fragile MLE or MAP fit…)
On Mar 14, 2017, at 6:51 PM, Yu Feng notifications@github.com wrote:
I have to be honest I have not built anything with stan yet -- just got very inspired by chapter 19 of the manual, because our model is a PDE solver.
FFT (real-complex and complex real) is half of what we need. We also need a few 'convolution' operators, which I named them paint/readout.
Here is a write up of the operators:
https://bids.berkeley.edu/news/automatic-differentiation-and-cosmology-simulation https://bids.berkeley.edu/news/automatic-differentiation-and-cosmology-simulation Jump to "Two Useful Operators in Particle-Mesh Solvers"
'Psi' operators (vector-jacobian-product in autograd) --are useful for back-propagation of the gradients. I was using the naive gradient convention for complex number derivatives -- it makes the real-complex gradients easier to write but it is mathematically uglier than the correct Wirtinger derivatives. If stan has never dealt with complex gradients then it is important to make a decision.
Once the FFT lands we can worry about the paint/readout operators. They are currently implemented in C and packed in a Python package.[1] It wouldn't be hard to lift them out to stan, I imagine. I can contribute a plugin if you point to me where to look.
Eventually we will worry about parallel -- we deal with millions of parameters and want to push to billions. Our current tools uses MPI since it is not fault tolerant and thus easier to program(as least for us with HPC background). Is there a desire along this line with the stan development?
[1] https://github.com/rainwoodman/pmesh/blob/master/pmesh/_window.pyx https://github.com/rainwoodman/pmesh/blob/master/pmesh/_window.pyx — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/math/issues/20#issuecomment-286586399, or mute the thread https://github.com/notifications/unsubscribe-auth/ABdNlqq9kpYQDCvP00x6WSsBZkFbb9Wjks5rlxnngaJpZM4FTCmI.
Thanks a lot!
I haven't thought about evolve the gradient of likelihood as an additional set of variables.
Is it similar to page 12 of http://wwwu.uni-klu.ac.at/bkaltenb/AD12_kaltenbacher.pdf ?
Could you share a reference or a keyword for me look up on the work with the likelihood with N-body solvers? That is exactly the problem we are facing -- and we currently use back-propagation to compute the gradients. Our parameter is the initial condition (O(10^7)), thus a joined-forward modelling of the sensitivity may not work.
I’d start by reading through the section on ODEs in the Stan autodiff paper, https://arxiv.org/abs/1509.07164 https://arxiv.org/abs/1509.07164. And then it’s probably best to continue this discussion on the User or Dev lists.
And maybe next time don’t have an AstroHackWeek that leaves out high-performance alternatives. ;-)
On Mar 15, 2017, at 7:40 PM, Yu Feng notifications@github.com wrote:
Thanks a lot!
I haven't thought about evolve the gradient of likelihood as an additional set of variables. Could you share a reference or a keyword for me look up on the work with the likelihood with N-body solvers? That is exactly the problem we are facing -- and we currently use back-propagation to compute the gradients.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/math/issues/20#issuecomment-286913965, or mute the thread https://github.com/notifications/unsubscribe-auth/ABdNlsPD8Vo7I84PuHMgzTRr2MXt7DGjks5rmHbmgaJpZM4FTCmI.
Ah. I see. Page 40, Chapter 13.4, coupled size N * (N + 1) is what drove us away from the coupled variable approach. N is O(10^7) in our case. We may want to reformulate our problem.
Yes, it is a large system, but the extended components are relatively sparse and you there is not way to get the necessary information otherwise. If you assume an extremely simple measurement model, such as pure Gaussian variation, then you can reduce that burden by projecting onto the expected output space directly (often called ambiguously adjoint methods) but in general it will be worth the cost for the full generality. You can also split the N (K + 1) burden into N K + N separate, parallelizeable solves.
On Mar 15, 2017, at 11:31 PM, Yu Feng notifications@github.com wrote:
Ah. I see. Page 40, Chapter 13.4, coupled size N * (N + 1) is what drove us away from the coupled variable approach. N is O(10^7) in our case. We may want to reformulate our problem.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/math/issues/20#issuecomment-286948261, or mute the thread https://github.com/notifications/unsubscribe-auth/ABdNlmx3Lr-Bcbk7ftLZKjb0l-gYIsEcks5rmK0PgaJpZM4FTCmI.
I somehow had the wrong impression that adjoint method were identical to back-propagation, and missed the entire literature. We shall look more carefully into it. Thanks!
Back-prop in neural nets is just a simple instance of the adjoint method in autodiff. So I'm not sure what you're asking here. But whatever it is, would you mind moving this discussion to our users list or to the dev list if you want to get involved in development. We try to keep issue discussions on topic and not let them devolve into speculative discussions. Thanks.
I'm going ahead and adding this now that we have complex vectors and can code it the natural way.
From @betanalpha on August 16, 2013 14:28
Expose Eigen's FFT into the modeling language:
vector fft(vector x)
Copied from original issue: stan-dev/stan#179