hannorein / rebound

💫 An open-source multi-purpose N-body code.
https://rebound.readthedocs.io/
GNU General Public License v3.0
820 stars 217 forks source link

replace unsafe sqrt7 implementation for IAS15? #723

Closed rahil-makadia closed 9 months ago

rahil-makadia commented 9 months ago

The current implementation of the sqrt7 function in the ias15 source code does the expansion for x^(1/7) for 20 iterations without checking whether the returned value has converged. Although these 20 iterations are more than good enough in almost all situations, there are pathological cases.

One example is the Apophis propagation unit test from the ASSIST library. If we re-build the test while printing the value of x^(1/7) at every iteration in the current implementation, we see that the values are very different at each iteration for the first 20 for the first few integration steps.

Switching to a while loop that checks for convergence within a tolerance or a violation of a maximum iteration limit is the better approach proposed in this PR. This ensures convergence in all scenarios and avoids unnecessarily doing 20 iterations when they are not necessary.

hannorein commented 9 months ago

Hi Rahil,

Thanks for reporting this. I am aware of the issue and have already implemented a workaround - it's just not merged into the main branch yet. Rather than having a while loop as you suggest it simply rescales the input which should be faster and more stable numerically: https://github.com/hannorein/rebound/blob/newias/src/integrator_ias15.c#L92

Hanno

hannorein commented 9 months ago

I've merged it in just now. Hopefully this also fixes the Apophis unit test. Let me know if not!

rahil-makadia commented 9 months ago

Hi Hanno,

I see, I have an implementation for ias15 within a library myself and I'll have to look into switching the implementation there as well. A couple of questions just for my understanding though:

  1. Is there still a reason to keep the 20-iteration for loop? Wouldn't most cases converge well before that?
  2. In the new implementation, you scale down numbers that are too large. Is there a reason numbers larger than 1e-2 are scaled down? The comments mention values larger than 1e2 should be scaled. Is this a bug from a typo?

Cheers, Rahil

hannorein commented 9 months ago

Ah. Thanks that was a typo. Well spotted! Fixed now.

In short, I just ran a some tests to make sure everything is converged. You can probably get away with fewer iterations for some of the range, but then there would be more branches and loops to make sure it converges everywhere, so 🤷‍♂️. It's clearly not optimal, but this is called once a timestep so performance is not super critical.

Your method would work just as fine. This function is only used for determining the next timestep. However, note that if this function were to be used somewhere else in the code, then breaking out of the loop at a tolerance of 1e-14 would probably not be good enough. Not because 1e-14 is large compared to machine precision, but because the error would be biased. Having a fixed number of iterations is one way to mitigate this.

Here's a notebook that shows the errors of the new versus old method: https://gist.github.com/hannorein/7fd0c4f0a489b6d36504c8cf88320fce

hannorein commented 9 months ago

PS : If you find a better method, feel free to submit a PR! But honestly - I don't think it's worth it. In any case, thanks again for pointing this out and for spotting the typo!

rahil-makadia commented 9 months ago

I'll be sure to check that out. Thanks for all your help and taking care of this quickly!