dynesty/ultranest samplers run indefinitely

bfhealy commented 1 year ago

Potentially related to default sampler parameters (#20): performing light_curve_analysis on an example candidate using the Bu2022Ye model and the dynesty/ultranest samplers appears to run indefinitely. I'm finding that the following calls to light_curve_analysis begin sampling but do not conclude:

dynesty:

light_curve_analysis --model Bu2022Ye --interpolation_type tensorflow --svd-path ./svdmodels --outdir ./outdir --label ZTF21abdpqpq_dynesty --data ./example_files/candidate_data/ZTF21abdpqpq.dat --prior ./priors/Bu2022Ye.prior --tmin 0 --tmax 14 --dt 0.1 --error-budget 1 --nlive 512 --Ebv-max 0 --filters ztfg,ztfr --trigger-time 59361.0 --plot --sampler dynesty

ultranest:

light_curve_analysis --model Bu2022Ye --interpolation_type tensorflow --svd-path ./svdmodels --outdir ./outdir --label ZTF21abdpqpq_ultranest --data ./example_files/candidate_data/ZTF21abdpqpq.dat --prior ./priors/Bu2022Ye.prior --tmin 0 --tmax 14 --dt 0.1 --error-budget 1 --nlive 512 --Ebv-max 0 --filters ztfg,ztfr --trigger-time 59361.0 --plot --sampler ultranest

However, using pymultinest for this sampling finishes in a few minutes:

light_curve_analysis --model Bu2022Ye --interpolation_type tensorflow --svd-path ./svdmodels --outdir ./outdir --label ZTF21abdpqpq_pymultinest --data ./example_files/candidate_data/ZTF21abdpqpq.dat --prior ./priors/Bu2022Ye.prior --tmin 0 --tmax 14 --dt 0.1 --error-budget 1 --nlive 512 --Ebv-max 0 --filters ztfg,ztfr --trigger-time 59361.0 --plot --sampler pymultinest

These runs were performed with the latest version of nmma and its requirements (including attempts with bilby-2.1.2 installed with pip and bilby-2.1.2.dev26+g9c1dda6c installed from the source).

tsunhopang commented 1 year ago

For the ultranest, could you try the following two independent approaches?

Run with mpiexec and see how long does it take? (From my experience, it should take ~10 times longer than pymultinest)
Run it with the following extra command line argument --reactive-sampling --sampler-kwargs "{'dlogz': 0.1}"

bfhealy commented 1 year ago

Hi @tsunhopang, thanks for the suggestions! I tried another ultranest run using mpiexec and the new arguments, and it sampled for several hours before failing with this error:

astropy.cosmology.core.CosmologyError: Best guess z=5.6740878366520494e-09 is very close to the lower z limit 0.0.
Try re-running with a different zmin.

I also saw this warning several times throughout the run:

UserWarning: Sampling from region seems inefficient (0/40 accepted in iteration 2500). To improve efficiency, modify the transformation so that the current live points (stored for you in /var/folders/8_/ky643qs168ngjmhrpwcq1fdm0000gn/T/tmpsv0qi0se/extra/sampling-stuck-it%d.csv) are ellipsoidal, or use a stepsampler, or set frac_remain to a lower number (e.g., 0.5) to terminate earlier.

tsunhopang commented 1 year ago

there seems to be some problem with the data, could u link it to here?

tsunhopang commented 1 year ago

also the prior u used for the analysis

bfhealy commented 1 year ago

Hi @tsunhopang, I tried two different Bu2022Ye analysis runs using nmma demo data. The data were for this ZTF candidate and AT2017gfo. The priors are here, and the function calls are below:

mpiexec -n 8 light_curve_analysis --model Bu2022Ye --interpolation_type tensorflow --svd-path ./svdmodels --outdir ./outdir --label ZTF21abdpqpq_ultranest --data ./example_files/candidate_data/ZTF21abdpqpq.dat --prior ./priors/Bu2022Ye.prior --tmin 0 --tmax 14 --dt 0.1 --error-budget 1 --nlive 512 --Ebv-max 0 --filters ztfg,ztfr --trigger-time 59361.0 --plot --sampler ultranest --reactive-sampling --sampler-kwargs "{'dlogz': 0.1}"

mpiexec -n 8 light_curve_analysis --model Bu2022Ye --interpolation_type tensorflow --svd-path ./svdmodels --outdir ./outdir --label AT2017gfo_ultranest --data ./example_files/lightcurves/AT2017gfo.dat --prior ./priors/Bu2022Ye.prior --tmin 0 --tmax 14 --dt 0.1 --error-budget 1 --nlive 512 --Ebv-max 0 --trigger-time 57983.0 --plot --sampler ultranest --reactive-sampling --sampler-kwargs "{'dlogz': 0.1}"

tsunhopang commented 1 year ago

Could u try the following:

Use a tighter prior on distance (e.g. for AT2017gfo, the distance is ~40Mpc)
The name of the KNtimeshift should be KNtimeshift rather than trigger_time
The prior on KNtimeshift can be set to zero if there is a clear trigger, otherwise, still better use a tighter prior
The trigger time for AT2017gfo should be 57982.52852

bfhealy commented 1 year ago

Thanks! I've started a new sampling run with these changes.

bfhealy commented 1 year ago

Hi @tsunhopang, I've tried running the ultranest sampling a few times using the following call, but each time it runs until my computer restarts because of a problem (presumably memory related).

mpiexec -n 8 light_curve_analysis --model Bu2022Ye --interpolation_type tensorflow --svd-path ./svdmodels --outdir ./outdir --label AT2017gfo_ultranest_new --data ./example_files/lightcurves/AT2017gfo.dat --prior ./priors/Bu2022Ye.prior --tmin 0 --tmax 14 --dt 0.1 --error-budget 1 --nlive 512 --Ebv-max 0 --trigger-time 57982.52852 --plot --sampler ultranest --reactive-sampling --sampler-kwargs "{'dlogz': 0.1}"

I also continue to see warnings about inefficient sampling as shared above. Perhaps different stopping criteria would help the ultranest finish before running out of memory?

Changing the sampler to pymultinest and removing the --reactive-sampling and --sampler_kwargs arguments successfully produces light curve/corner plots and other sampling results, although I need to interrupt the code in my terminal window in order to enter any more commands.

nuclear-multimessenger-astronomy / nmma

dynesty/ultranest samplers run indefinitely #202