Closed rodluger closed 6 years ago
Strange! I did extensive testing of the code, so I'm wondering if a bug was introduced recently. Here is a comparison plot for calculations up to N=10
:
https://github.com/rodluger/limbdark/blob/master/tex/figures/julia/Light_curve_grid.png
I can't recall how high of an l_max
that I tried.
Come to think of it I may have introduced the errors at very large l, since there is no more lfact
function in julia
1.0. So this line is now significantly less precise, and is probably causing the OverflowError
I mentioned above. Let me try running my comparison script on the master branch to see if I still get those flux oddities.
I haven't worked with branches before on git, so I'm having trouble finding the compare_to_starry.jl
in my local copy of the limb dark repo.
You could just download the file directly and run it. Probably best, since we should test the code before all of my other changes.
But for future reference, to access my new v1.0.0
branch,
git fetch
git checkout v1.0.0
That puts you on my branch. To get out of it, just do
git checkout master
I did precisely those commands, and yet I can't find compare_to_starry.jl
in tex/figures/julia/
Either way: you've additionally sold me on Julia
by showing me how easy it is to call the python version of starry
!
You could try git pull origin v1.0.0
On Wed, Oct 3, 2018 at 3:56 PM Eric Agol notifications@github.com wrote:
I did precisely those commands, and yet I can't find compare_to_starry.jl in tex/figures/julia/
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/rodluger/limbdark/issues/39#issuecomment-426778295, or mute the thread https://github.com/notifications/unsubscribe-auth/AI5FK_vZEj8Poz-Wl7Jl6QJsYmnbN5ABks5uhRZpgaJpZM4XGq8A .
Okay, thanks. I have written a test to compare the analytic formula with numerical integration, but perhaps it isn't comprehensive enough:
https://github.com/rodluger/limbdark/blob/master/tests/test_transit_poly.jl
Yeah, I suspect not. I confirm that the main issues in my first post are present in the master
branch, prior to any of my changes.
I updated the master
branch with my comparison script.
P.S.: The syntax for calling Python functions from Julia is hoooooorrible!!
Yes, I noticed that with calling matplotlib from pyplot, although I’ve gotten used to it!
@rodluger Is that runtime plot using your starry version, or the Julia version?
@ericagol Using my starry version
@rodluger I extended the test_transit_poly.jl
to test the same light curve that you computed, and it seems to work fine, within a tolerance of 10^{-4}
(which is the limitation of the numerical computation). This test calls transit_poly(r,abs(b),u_n)
, not transit_poly!(trans)
, so I think the bug is in the latter (which uses the trans
structure, which is the Julia version of an object, to save time on memory allocation). Indeed, when I make the following change to compare_to_starry.jl
:
# flux_julia[i] = transit_poly!(trans)
flux_julia[i] = transit_poly(r,abs(b[i]),u_n)
then the agreement looks good, with a maximum absolute difference of 4e-7
(which is a bit larger than I would have expected). So, the problem seems to lie in the transit_poly!(trans)
function; nice catch! And this indicates that when I write tests, I should test every version of a function.
Here is the comparison for N=30
:
@rodluger I'm still unclear on why my timing tests aren't scaling in the expected way. It may be worth doing a comparison of the absolute times in addition to the relative times.
@rodluger I think I'm closing in on the bug... I think there are some of the components of the transit structure that should be set to zero are not. If I re-iniatialize the transit structure in every step in compare_to_starry.jl
, then the results of the two codes agree:
trans = transit_init(r, abs(b[i]), u_n, true)
flux_julia[i] = transit_poly!(trans)
When the structure is not re-initialized, then there are differences in the sn
, Jv
, dJvdk
, dsndr
, dsndb
, dfdrbc
, dfdrbu
(which computing gradient), while all of the other components agree. This should help me diagnose which components are causing problems. Thanks again for catching this! I haven't used structures/objects in programming too often, so I'm not surprised I made a mistake with this. The main goal in creating the structure was to avoid having to re-allocate vectors and arrays every time the function is called, which helped to speed it up, but obviously introduced some error.
One thing I noticed in this problem is that the c_n
coefficients can get very large. I think what may be going on is that for near constant values of u_n
, the higher order components nearly cancel, so likely the c_n
have to become large to compensate for this. I'm wondering if this also occurs if the u_n
values are not nearly equal.
@ericagol Cool. I confirm that your code works when you re-initialize the struct, so this is likely a memory issue. I think we should wait to to think about the timing tests until you've found the bug, since I suspect that once you fix it you'll get similar scalings to me. Currently, with the bugged version, my code is faster than yours up to about lmax=10
, but then your code blows mine out of the water with that n^1/3
scaling! I suspect that there's something in there causing the code to use results from the previous step instead of re-calculating them, giving you the amazing scaling. If I try to do the timing benchmarks by re-initializing the struct each time, the code takes forever, so that's not a good metric either.
Separately, I seem to have some precision issues at high lmax
. I get severe numerical instabilities starting around lmax=35
, but you seem to be able to get up to lmax=50
before they dominate. I'm going to look into that today.
@ericagol Nevermind, I was comparing it to the previous version of starry, which still implemented a change of basis from spherical harmonics to limb darkening coefficients. As I mentioned before, those operations are imprecise at high l
. The new version of the code, which is on the limbdark
branch of the starry
repo, matches yours a lot better at high l
. Note that your comment of the large difference (4e-7
) between our codes is probably due to this. When comparing to the newer version of my code, the max difference is 3e-8
and the median difference is 1e-9
for l = 30
:
@rodluger That looks great! I would have expected the comparison to be closer to double precision rather than single precision, though.
I found the bug - it's in this line:
The problem I was trying to fix was that sometimes the series expansion of J_v
gives values that are zero due to underflow, and so the iteration subsequently gives incorrect J_v
for the downward recursion towards smaller values of v
. So, what I meant to do was to first compute J_v
for the maximum value of v
, and then to check if its value is zero. This works in the case that the transit structure is initialized since the array J_v
is set to zero, but once J_v
is computed, on the second call it will be non-zero, and so the top value of J_v
is taken from the last step, causing the problem. I'll push the fix to the master branch - what I do is eliminate the if
statement, and just set t.Jv[v+1] = zero(k2)
(so that the zero has the same type as the input variable k2
).
Nice! Can you run your timing tests again with the fixed version and see how things scale?
@ericagol
Re: your comment about single versus double precision. It seems like both our codes are yielding precision close to the 1e-9
level for l = 30
. Here's the difference between our two models and the 128-bit version of starry
:
Yours is slightly more precise, but both our noise floors are at about 1 ppb.
As you pointed out earlier, this is probably related to the fact that the c_n
s get very large.
@rodluger Okay, that explains it. Is this the result of compare_to_starry.jl?
This could be a rabbit-hole, but I'm wondering if there is a more stable formulation of the Green's components.
It's definitely a rabbit hole! I think that in general, going much beyond 30th degree is unnecessary, even with our plan to fit stellar models perfectly. Plus, higher order coefficients tend to be small anyways, so they will contribute several orders of magnitude less noise than what we're seeing here. The regularization we plan on doing for these fits should damp their values even further, so I think we are OK for now.
(to compare to 128-bit starry
, just pass the multi=true
keyword when instantiating a Map
)
@ericagol Any luck re-running the timing tests? Something's weird -- the code seems to be taking an extremely long time to run now. I had to cancel it before I got any results.
@rodluger Sorry, I haven't had the time yet... will get to this this afternoon (I hope).
@rodluger I've re-run the timing tests, and they seem to run fine. Here are the figures, and now the scaling with the number of coefficients seems to be even better than before - about n^{0.24}
.
@rodluger By the way, I am computing all of the light curve derivatives in this computation.
@ericagol Wow! Have your latest changes been pushed to github? I'll play around with this today.
@rodluger Sorry, I thought I had pushed the changes, but I hadn't. I just tried to synchronize and push the changes to the master branch.
@ericagol I think I found the issue! It's on this line of your script. Instead of looping through the values of nu = [1,2,3,5,8,13,21,34,55,89]
, you're looping over the indices, so you are instead timing how long it takes to run maps of degree 1 to 10 only.
If I run that with my code, I get exactly the same scaling as you. But, as I mentioned on Skype, above about n = 10
the runtime scales more steeply because of the quadratic dependence of the nested for loops. I suspect that if you fix this issue, you'll get the same scaling as starry
!
@rodluger Wow, what a mistake on my part! Thanks for catching this; that explains why the code is so much slower for large values of limb-darkening when I call the function directly. So, that likely resolves the first and third problems in issue #40. For large values of the number of limb-darkening coefficients, I found that about half of the time was spent in applying the chain rule for derivatives of c_n
to derivatives of u_n
. I also found that when computing derivatives, I have a whole bunch of unnecessary divisions. I've fixed these two problems, and found a speed-up of about a factor of two in the large-number-of-coefficients limit. I'm still looking into the profiling to see where most of the time is spent, but this may be moot now. Then, I still need to find out what is responsible for the difference between the Julia versions.
@rodluger I've looked into the scaling with n
, the number of limb-darkening coefficients, and the Julia code seems to scale linearly for large values:
(I reduced the maximum number of impact parameters to 10^5
, and now only do one timing measurement rather than nine, to speed up the computation.)
@rodluger I'm looking into the code bottlenecks, and it looks like for n=55
, I'm still spending half of the time on converting the c_n
derivatives to u_n
, and half of the time is spent in the remaining computation (Iv, Jv, and computing the light curve and derivatives from these). The lines that seem to be causing your bottleneck account for about 10% of the run time in the Julia version.
@rodluger Okay, I figured out how to use the BLAS
routine gemv
to speed up the computation of the limb-darkening derivative propagation. This gives a ~40% overall speedup for n=55
(this shouldn't make much difference for small n
). The benchmarks now look as follows (going up to 10^5
data points and only one iteration rather than nine):
benchmark_limbdark_timing.pdf benchmark_poly_transit.pdf
I've also now implemented an iterative assignment of the I_v
values to two variables, Iv1
and Iv1
, and as I iterate downwards I recursively set Iv2=Iv1
and get the next Iv1
value from the array, with the same for Jv
. I'm having trouble making a profile comparison, but I think this change only sped up the code slightly.
@rodluger I watched the Activity monitor in the the latest benchmark test which used the BLAS.gemv!
routine for converting from c_n
to u_n
derivatives, and the CPU percentage reaches ~370%, which I believe means that Julia is employing all four processors. So, the speed-up in this case I think is simply due to multi-threading, and thus it is not a fair comparison.
Here are the results when I go back to the multiplication with loops, which is faster for small n
, but slower for large n
, and the Activity monitor shows the CPU at 100%, so no multi-threading. There is an @inbounds
macro which can be used to speed up loops, and when I apply this, I can almost match the speed of the BLAS.gemv!
, but without any multi-threading (the Activity monitor shows 100%)! I might be able to turn on multi-threading to further gain even more speed...
For the paper, it probably doesn't make sense to go beyond l_max ~ 45
due to the instability.
@rodluger In compare_to_starry.jl
, how do I obtain the derivatives from starry
in the line:
flux_starry = map[:flux](xo=b, ro=r)
I would like to make a comparison between the starry
and limbdark
derivatives.
Do
flux_starry, grad = map[:flux](xo=0.1, ro=0.1, gradient=true)
and the gradient will be stored as a Dict
in grad
.
What is the y
component of the grad
dict?
Those are the derivatives with respect to all of the spherical harmonic coefficients. For pure limb darkening, the only derivative in there is the Y_{0,0}
one. In the case of limb-darkening, Y_{0,0}
is equal to the disk-integrated flux of the body.
Also, make sure you have the latest version of starry
installed from source on the master
branch. I only just pushed the code using your equations to master
yesterday.
Hmmm... derivatives disagree severely. Probably a bug on my end as usual, but I did synchronize starry in the github repository, but when I tried conda update starry
I received the message #All requested packages already installed.
So, it's also possible that I'm not using the correct version of starry. I've modified compare_to_starry.jl
to include derivatives in the master branch.
@ericagol I'll run it on my end in a bit and let you know what I get. The conda
version has not yet been updated with the Agol & Luger formulae; you'd have to do a manual install from git to get it up to date. But my derivatives should have been correct before...
@rodluger I have a test (test_transit_poly_gradient2.jl
) which computes finite-difference gradients at BigFloat
precision, and compares with the analytic gradients, and it seems to pass. So, if there is a bug, it's probably an error in how I'm interpreting the gradients, or something similar.
@ericagol When I run your script with nu=3
, this is what I get.
Your values of df/dr
and df/db
seem to have a discontinuity at b ~ -0.75
. But that's not the only issue -- I'm still digging.
I finished coding up and benchmarking the
limbdark
equations instarry
on the limbdark branch. I'm finding several issues with thejulia
implementation. I created this julia script to easily compare fluxes to the Python/C++ version.Here are some of the issues. Below is a full transit calculation for an
lmax = 6
limb-darkened map. In some regimes the two codes agree exactly, but near the limb and near the center yourjulia
code is giving spurious results:The issues get worse when I go to higher order. Here's
lmax = 30
:I find that I'm unable to go above about 30 with
julia
-- I get errors sayingin the
factorial_lookup
function. In some regimes ofb, r
space your code works, but I find that it becomes unstable nearlmax = 50
. My code actually gets unstable near thatlmax
as well. We should either identify the source of the instability or add something to the paper saying that's as far as we can currently go.Finally, the run times. I find that for
lmax
up to about 20 my code is just as fast or faster than yours, but above that yours becomes faster. I suspect once you find the source of the errors above, you'll get longer runtimes. @dfm and I spent some time looking at how various calculations scale and I'm very surprised by then^1/3
orn^3/8
scaling you find. The number of operations scales linearly with the number of coefficients in some parts of the code, and quadratically in others, so the total scaling should be somewhere in between. Here is my version of your paper figure:Note that up to
lmax ~ 10
I find roughly the same scaling as you, but above that the runtime begins to scale more steeply. The dashed lines are linear scalings and the dotted lines are quadratic scalings: at highlmax
, the slope is somewhere in between the slopes of those lines, consistent with my scaling arguments from poking around in the code.