valboz / VBMicrolensing

Microlensing computation code, including single, binary and multiple lenses
GNU Lesser General Public License v3.0
7 stars 2 forks source link

Infinite loops possible within do loop of VBMicrolensing::BinaryMag #3

Open mtpenny opened 1 month ago

mtpenny commented 1 month ago

image

It is possible for the do loop in BinaryMag to continue indefinitely, causing any program using it to hang indefinitely. The image above is a quasi-static output of perf top for a hung process that is using VBMicrolensing, which uniquely identifies this do loop as the culprit. On two hung processes, the same calls are being made continuously. The cases that cause this are rare, and I'm currently running further simulations to try and catch the parameters that cause the hang and will update the issue with them once I catch one.

The solution is probably easy - a max_iterations test that breaks out of the loop and sets an error flag that can be used to indicate a failure to the user.

valboz commented 1 month ago

Hi Matthew, are you using the BinaryMag2 function or are you calling the BinaryMag directly? The BinaryMag2 contains more checks to avoid infinite loops (and it is optimized with the quadropole test). In case you are finding these infinite loops with the BinaryMag2, can you send me the exact parameters in order to reproduce the problem?

Thanks

mtpenny commented 1 month ago

Hi Valerio,

I'm calling BinaryMag2:

cout << setprecision(16) << s << " " << q << " " << xsrot
                 << " " << ysrot << " " << rs << setprecision(6)
                 << endl;
          amp = VBM.BinaryMag2(s,q,xsrot,ysrot,rs);

I'm still waiting on a run to get stuck, but will have the parameters available when one does.

Here's one set of parameters that has gotten stuck: s q xs ys rs 0.3439553365896691 2.717607702893884e-06 7.531721896393501e-05 -0.0002606818364511856 0.0003360659720343283 The other parameters are: VBM.a1=0.36 VBM.Tol=1.e-4

Here's the full screenshot of perf report for 10 seconds of the stuck process:

image

I'm not sure BinaryMag2 is getting the chance to act on detection of the infinite loop, because when it calls BinaryMag, BinaryMag is getting stuck in the do-while loop with the ending condition here:

    } while ((currerr > errimage) && (currerr > RelTol * Mag) && (NPS < NPSmax) && ((flag < NPSold)/* || NPS<8 ||(currerr>10*errimage)*/)/*&&(flagits)*/);

The process is still running, so let me know if you want any more information from it.

mtpenny commented 1 month ago

Interestingly the process with the above parameters did manage to unstick itself after a long time, but then got stuck very nearby: 0.3439553365896691 2.717607702893884e-06 -7.808759522837607e-05 -0.0002638883693914945 0.0003360659720343283

Another two processes also got stuck at: 1.037991143184148 2.904291099724379e-06 0.005508860783625334 0.0005377544248048563 0.008655094421785355 1.06172899003425 7.799369193384062e-07 -0.0001225743751406304 -0.0008792405329688662 0.0021834839603033

A common theme seems to be small mass ratios and high magnifications (~1000) with the source near the central caustic.

valboz commented 1 month ago

Hi Matthew,

indeed we are close to the boundary of the parameter range recommended in AdvancedControl. As explained therein, failures are below 1 in 1000 for each annulus. However, with limb darkening on, here you have more than 500 annuli! So, it is not surprising that you are finding errors sometimes.

Indeed, it is not clear to me why you need to push the code to such demanding accuracy goal. In fact, with a magnification of 900, an absolute accuracy of 10^-4 corresponds to relative uncertainties better than 0.1 ppm, which is not what we expect from any realistic observations. I suggest you to set VBM.RelTol = 1.e-4 or 1.e-5 at most while keeping VBM.Tol=1.e-4. Relative accuracy is what is actually relevant in the comparison with observations. The calculations will be much faster and safer in the high-magnification regime. In fact, I find that the magnification remains stable unless you push the tolerance requirements to unrealistically small values.

Said that, indeed failures exist in extreme regimes and it is my duty to improve the algorithm so as to avoid them as much as possible and broaden the recommended parameter ranges.

Thanks for pointing these cases to me.

valboz commented 1 month ago

By the way, in the dev branch, besides many other optimizations introduced by my collaborators, I have introduced a patch that should solve most of the problems for high-mag events. You may try and test this version.