Issues in comparing to starry

rodluger commented 6 years ago

I finished coding up and benchmarking the limbdark equations in starry on the limbdark branch. I'm finding several issues with the julia implementation. I created this julia script to easily compare fluxes to the Python/C++ version.

Here are some of the issues. Below is a full transit calculation for an lmax = 6 limb-darkened map. In some regimes the two codes agree exactly, but near the limb and near the center your julia code is giving spurious results:

The issues get worse when I go to higher order. Here's lmax = 30:

I find that I'm unable to go above about 30 with julia -- I get errors saying

ERROR: LoadError: OverflowError: 23 is too large to look up in the table

in the factorial_lookup function. In some regimes of b, r space your code works, but I find that it becomes unstable near lmax = 50. My code actually gets unstable near that lmax as well. We should either identify the source of the instability or add something to the paper saying that's as far as we can currently go.

Finally, the run times. I find that for lmax up to about 20 my code is just as fast or faster than yours, but above that yours becomes faster. I suspect once you find the source of the errors above, you'll get longer runtimes. @dfm and I spent some time looking at how various calculations scale and I'm very surprised by the n^1/3 or n^3/8 scaling you find. The number of operations scales linearly with the number of coefficients in some parts of the code, and quadratically in others, so the total scaling should be somewhere in between. Here is my version of your paper figure:

Note that up to lmax ~ 10 I find roughly the same scaling as you, but above that the runtime begins to scale more steeply. The dashed lines are linear scalings and the dotted lines are quadratic scalings: at high lmax, the slope is somewhere in between the slopes of those lines, consistent with my scaling arguments from poking around in the code.

ericagol commented 6 years ago

Strange! I did extensive testing of the code, so I'm wondering if a bug was introduced recently. Here is a comparison plot for calculations up to N=10:

https://github.com/rodluger/limbdark/blob/master/tex/figures/julia/Light_curve_grid.png

I can't recall how high of an l_max that I tried.

rodluger commented 6 years ago

Come to think of it I may have introduced the errors at very large l, since there is no more lfact function in julia 1.0. So this line is now significantly less precise, and is probably causing the OverflowError I mentioned above. Let me try running my comparison script on the master branch to see if I still get those flux oddities.

ericagol commented 6 years ago

I haven't worked with branches before on git, so I'm having trouble finding the compare_to_starry.jl in my local copy of the limb dark repo.

rodluger commented 6 years ago

You could just download the file directly and run it. Probably best, since we should test the code before all of my other changes.

rodluger commented 6 years ago

But for future reference, to access my new v1.0.0 branch,

git fetch
git checkout v1.0.0

That puts you on my branch. To get out of it, just do

git checkout master

ericagol commented 6 years ago

I did precisely those commands, and yet I can't find compare_to_starry.jl in tex/figures/julia/

ericagol commented 6 years ago

Either way: you've additionally sold me on Julia by showing me how easy it is to call the python version of starry!

rodluger commented 6 years ago

You could try git pull origin v1.0.0

On Wed, Oct 3, 2018 at 3:56 PM Eric Agol notifications@github.com wrote:

I did precisely those commands, and yet I can't find compare_to_starry.jl in tex/figures/julia/

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/rodluger/limbdark/issues/39#issuecomment-426778295, or mute the thread https://github.com/notifications/unsubscribe-auth/AI5FK_vZEj8Poz-Wl7Jl6QJsYmnbN5ABks5uhRZpgaJpZM4XGq8A .

ericagol commented 6 years ago

Okay, thanks. I have written a test to compare the analytic formula with numerical integration, but perhaps it isn't comprehensive enough:

https://github.com/rodluger/limbdark/blob/master/tests/test_transit_poly.jl

rodluger commented 6 years ago

Yeah, I suspect not. I confirm that the main issues in my first post are present in the master branch, prior to any of my changes.

I updated the master branch with my comparison script.

rodluger commented 6 years ago

P.S.: The syntax for calling Python functions from Julia is hoooooorrible!!

ericagol commented 6 years ago

Yes, I noticed that with calling matplotlib from pyplot, although I’ve gotten used to it!

ericagol commented 6 years ago

@rodluger Is that runtime plot using your starry version, or the Julia version?

rodluger commented 6 years ago

@ericagol Using my starry version

ericagol commented 6 years ago

@rodluger I extended the test_transit_poly.jl to test the same light curve that you computed, and it seems to work fine, within a tolerance of 10^{-4} (which is the limitation of the numerical computation). This test calls transit_poly(r,abs(b),u_n), not transit_poly!(trans), so I think the bug is in the latter (which uses the trans structure, which is the Julia version of an object, to save time on memory allocation). Indeed, when I make the following change to compare_to_starry.jl:

#    flux_julia[i] = transit_poly!(trans)
    flux_julia[i] = transit_poly(r,abs(b[i]),u_n)

then the agreement looks good, with a maximum absolute difference of 4e-7 (which is a bit larger than I would have expected). So, the problem seems to lie in the transit_poly!(trans) function; nice catch! And this indicates that when I write tests, I should test every version of a function.

Here is the comparison for N=30:

compare_to_starry_30.pdf

ericagol commented 6 years ago

@rodluger I'm still unclear on why my timing tests aren't scaling in the expected way. It may be worth doing a comparison of the absolute times in addition to the relative times.

ericagol commented 6 years ago

@rodluger I think I'm closing in on the bug... I think there are some of the components of the transit structure that should be set to zero are not. If I re-iniatialize the transit structure in every step in compare_to_starry.jl, then the results of the two codes agree:

    trans = transit_init(r, abs(b[i]), u_n, true)
    flux_julia[i] = transit_poly!(trans)

When the structure is not re-initialized, then there are differences in the sn, Jv, dJvdk, dsndr, dsndb, dfdrbc, dfdrbu (which computing gradient), while all of the other components agree. This should help me diagnose which components are causing problems. Thanks again for catching this! I haven't used structures/objects in programming too often, so I'm not surprised I made a mistake with this. The main goal in creating the structure was to avoid having to re-allocate vectors and arrays every time the function is called, which helped to speed it up, but obviously introduced some error.

One thing I noticed in this problem is that the c_n coefficients can get very large. I think what may be going on is that for near constant values of u_n, the higher order components nearly cancel, so likely the c_n have to become large to compensate for this. I'm wondering if this also occurs if the u_n values are not nearly equal.

rodluger commented 6 years ago

@ericagol Cool. I confirm that your code works when you re-initialize the struct, so this is likely a memory issue. I think we should wait to to think about the timing tests until you've found the bug, since I suspect that once you fix it you'll get similar scalings to me. Currently, with the bugged version, my code is faster than yours up to about lmax=10, but then your code blows mine out of the water with that n^1/3 scaling! I suspect that there's something in there causing the code to use results from the previous step instead of re-calculating them, giving you the amazing scaling. If I try to do the timing benchmarks by re-initializing the struct each time, the code takes forever, so that's not a good metric either.

rodluger commented 6 years ago

Separately, I seem to have some precision issues at high lmax. I get severe numerical instabilities starting around lmax=35, but you seem to be able to get up to lmax=50 before they dominate. I'm going to look into that today.

rodluger commented 6 years ago

@ericagol Nevermind, I was comparing it to the previous version of starry, which still implemented a change of basis from spherical harmonics to limb darkening coefficients. As I mentioned before, those operations are imprecise at high l. The new version of the code, which is on the limbdark branch of the starry repo, matches yours a lot better at high l. Note that your comment of the large difference (4e-7) between our codes is probably due to this. When comparing to the newer version of my code, the max difference is 3e-8 and the median difference is 1e-9 for l = 30:

ericagol commented 6 years ago

@rodluger That looks great! I would have expected the comparison to be closer to double precision rather than single precision, though.

I found the bug - it's in this line:

https://github.com/rodluger/limbdark/blob/333f34eccc44c8336257e7beb87169dbdeb0dc4f/src/IJv_derivative_struct.jl#L376

The problem I was trying to fix was that sometimes the series expansion of J_v gives values that are zero due to underflow, and so the iteration subsequently gives incorrect J_v for the downward recursion towards smaller values of v. So, what I meant to do was to first compute J_v for the maximum value of v, and then to check if its value is zero. This works in the case that the transit structure is initialized since the array J_v is set to zero, but once J_v is computed, on the second call it will be non-zero, and so the top value of J_v is taken from the last step, causing the problem. I'll push the fix to the master branch - what I do is eliminate the if statement, and just set t.Jv[v+1] = zero(k2) (so that the zero has the same type as the input variable k2).

rodluger commented 6 years ago

Nice! Can you run your timing tests again with the fixed version and see how things scale?

rodluger commented 6 years ago

@ericagol Re: your comment about single versus double precision. It seems like both our codes are yielding precision close to the 1e-9 level for l = 30. Here's the difference between our two models and the 128-bit version of starry: Yours is slightly more precise, but both our noise floors are at about 1 ppb.

rodluger commented 6 years ago

As you pointed out earlier, this is probably related to the fact that the c_ns get very large.

ericagol commented 6 years ago

@rodluger Okay, that explains it. Is this the result of compare_to_starry.jl?

ericagol commented 6 years ago

This could be a rabbit-hole, but I'm wondering if there is a more stable formulation of the Green's components.

rodluger commented 6 years ago

It's definitely a rabbit hole! I think that in general, going much beyond 30th degree is unnecessary, even with our plan to fit stellar models perfectly. Plus, higher order coefficients tend to be small anyways, so they will contribute several orders of magnitude less noise than what we're seeing here. The regularization we plan on doing for these fits should damp their values even further, so I think we are OK for now.

rodluger commented 6 years ago

(to compare to 128-bit starry, just pass the multi=true keyword when instantiating a Map)

rodluger commented 6 years ago

@ericagol Any luck re-running the timing tests? Something's weird -- the code seems to be taking an extremely long time to run now. I had to cancel it before I got any results.

ericagol commented 6 years ago

@rodluger Sorry, I haven't had the time yet... will get to this this afternoon (I hope).

ericagol commented 6 years ago

@rodluger I've re-run the timing tests, and they seem to run fine. Here are the figures, and now the scaling with the number of coefficients seems to be even better than before - about n^{0.24}.

benchmark_limbdark_timing.pdf benchmark_poly_transit.pdf

ericagol commented 6 years ago

@rodluger By the way, I am computing all of the light curve derivatives in this computation.

rodluger commented 6 years ago

@ericagol Wow! Have your latest changes been pushed to github? I'll play around with this today.

ericagol commented 6 years ago

@rodluger Sorry, I thought I had pushed the changes, but I hadn't. I just tried to synchronize and push the changes to the master branch.

rodluger commented 6 years ago

@ericagol I think I found the issue! It's on this line of your script. Instead of looping through the values of nu = [1,2,3,5,8,13,21,34,55,89], you're looping over the indices, so you are instead timing how long it takes to run maps of degree 1 to 10 only.

If I run that with my code, I get exactly the same scaling as you. But, as I mentioned on Skype, above about n = 10 the runtime scales more steeply because of the quadratic dependence of the nested for loops. I suspect that if you fix this issue, you'll get the same scaling as starry!

ericagol commented 6 years ago

@rodluger Wow, what a mistake on my part! Thanks for catching this; that explains why the code is so much slower for large values of limb-darkening when I call the function directly. So, that likely resolves the first and third problems in issue #40. For large values of the number of limb-darkening coefficients, I found that about half of the time was spent in applying the chain rule for derivatives of c_n to derivatives of u_n. I also found that when computing derivatives, I have a whole bunch of unnecessary divisions. I've fixed these two problems, and found a speed-up of about a factor of two in the large-number-of-coefficients limit. I'm still looking into the profiling to see where most of the time is spent, but this may be moot now. Then, I still need to find out what is responsible for the difference between the Julia versions.

ericagol commented 6 years ago

@rodluger I've looked into the scaling with n, the number of limb-darkening coefficients, and the Julia code seems to scale linearly for large values:

benchmark_limbdark_timing.pdf

(I reduced the maximum number of impact parameters to 10^5, and now only do one timing measurement rather than nine, to speed up the computation.)

ericagol commented 6 years ago

@rodluger I'm looking into the code bottlenecks, and it looks like for n=55, I'm still spending half of the time on converting the c_n derivatives to u_n, and half of the time is spent in the remaining computation (Iv, Jv, and computing the light curve and derivatives from these). The lines that seem to be causing your bottleneck account for about 10% of the run time in the Julia version.

ericagol commented 6 years ago

@rodluger Okay, I figured out how to use the BLAS routine gemv to speed up the computation of the limb-darkening derivative propagation. This gives a ~40% overall speedup for n=55 (this shouldn't make much difference for small n). The benchmarks now look as follows (going up to 10^5 data points and only one iteration rather than nine):

benchmark_limbdark_timing.pdf benchmark_poly_transit.pdf

I've also now implemented an iterative assignment of the I_v values to two variables, Iv1 and Iv1, and as I iterate downwards I recursively set Iv2=Iv1 and get the next Iv1 value from the array, with the same for Jv. I'm having trouble making a profile comparison, but I think this change only sped up the code slightly.

ericagol commented 6 years ago

@rodluger I watched the Activity monitor in the the latest benchmark test which used the BLAS.gemv! routine for converting from c_n to u_n derivatives, and the CPU percentage reaches ~370%, which I believe means that Julia is employing all four processors. So, the speed-up in this case I think is simply due to multi-threading, and thus it is not a fair comparison.

Here are the results when I go back to the multiplication with loops, which is faster for small n, but slower for large n, and the Activity monitor shows the CPU at 100%, so no multi-threading. There is an @inbounds macro which can be used to speed up loops, and when I apply this, I can almost match the speed of the BLAS.gemv!, but without any multi-threading (the Activity monitor shows 100%)! I might be able to turn on multi-threading to further gain even more speed...

benchmark_limbdark_timing.pdf benchmark_poly_transit.pdf

ericagol commented 6 years ago

For the paper, it probably doesn't make sense to go beyond l_max ~ 45 due to the instability.

ericagol commented 6 years ago

@rodluger In compare_to_starry.jl, how do I obtain the derivatives from starry in the line:

flux_starry = map[:flux](xo=b, ro=r)

I would like to make a comparison between the starry and limbdark derivatives.

rodluger commented 6 years ago

Do

flux_starry, grad = map[:flux](xo=0.1, ro=0.1, gradient=true)

and the gradient will be stored as a Dict in grad.

ericagol commented 6 years ago

What is the y component of the grad dict?

rodluger commented 6 years ago

Those are the derivatives with respect to all of the spherical harmonic coefficients. For pure limb darkening, the only derivative in there is the Y_{0,0} one. In the case of limb-darkening, Y_{0,0} is equal to the disk-integrated flux of the body.

rodluger commented 6 years ago

Also, make sure you have the latest version of starry installed from source on the master branch. I only just pushed the code using your equations to master yesterday.

ericagol commented 6 years ago

Hmmm... derivatives disagree severely. Probably a bug on my end as usual, but I did synchronize starry in the github repository, but when I tried conda update starry I received the message #All requested packages already installed. So, it's also possible that I'm not using the correct version of starry. I've modified compare_to_starry.jl to include derivatives in the master branch.

rodluger commented 6 years ago

@ericagol I'll run it on my end in a bit and let you know what I get. The conda version has not yet been updated with the Agol & Luger formulae; you'd have to do a manual install from git to get it up to date. But my derivatives should have been correct before...

ericagol commented 6 years ago

@rodluger I have a test (test_transit_poly_gradient2.jl) which computes finite-difference gradients at BigFloat precision, and compares with the analytic gradients, and it seems to pass. So, if there is a bug, it's probably an error in how I'm interpreting the gradients, or something similar.

rodluger commented 6 years ago

@ericagol When I run your script with nu=3, this is what I get. Your values of df/dr and df/db seem to have a discontinuity at b ~ -0.75. But that's not the only issue -- I'm still digging.

rodluger / Limbdark.jl

Issues in comparing to starry #39