tomasstolker / species

Toolkit for atmospheric characterization of directly imaged exoplanets
https://species.readthedocs.io
MIT License
22 stars 9 forks source link

Fitting with nested sampling (UltraNest or MultiNest) #87

Closed gabrielastro closed 9 months ago

gabrielastro commented 9 months ago
  1. In the documentation of fit.run_ultranest, about prior:

    The parameter is not used if set to ``None``.
    →
    A flat prior betwen XX and YY (log? lin?) is used

    or something like this to make it clearer.

  2. Is it possible to terminate the run early without losing all the data? It seems that Ctrl+C terminates the run but also corrupts the database because it remains open and the user cannot close it by hand. For MultiNest, Ctrl+C (most likely?) does not stop the run (as mentioned in #76). It would be nice to have for both UltraNest and MultiNest the possibility of graciously stopping without corrupting the database (and as a bonus, maybe even saving the points up to now, but this is really extra). I made a mistake starting a run with too many live points and could not correct this, only wait for it to finish…

  3. If 2. is possible, while the run is going, is there an easy way of plotting (in physical-parameter space) the distribution of points found up to now, if this is meaningful to estimate whether the run is actually finished or still needs a long time? Especially when testing, this would be practical. (As you probably know, "UltraNest has a visualisation to observe the current live points" but I am guessing this is deep in the package. For MultiNest, I cannot make a sense of how close it is to be finished, and just looking by hand at the distribution of points while the simulation is running should give a good idea.)

  4. I got the warning

    UserWarning: Sampling from region seems inefficient (0/40 accepted in iteration 2500). To improve efficiency, modify the transformation so that the current live points (stored for you in […]/run1/extra/sampling-stuck-it%d.csv) are ellipsoidal, or use a stepsampler, or set frac_remain to a lower number (e.g., 0.5) to terminate earlier.

    How can I "use a stepsampler" or "set frac_remain"? Or are these just symptons and for typical fitting situations, it means I am doing something wrong at a deeper level?

  5. Hoping to have a quick, rough run, I tried min_num_live_points=100 in fit.run_ultranest() but after "the main part" came the message:

    [ultranest] Evidency uncertainty strategy wants 199 minimum live points (dlogz from 0.50 to 1.18, need <0.5)

    and it wrote something about increasing the number of live points and it started runnning again… So in the end it took what felt like a similar amount of time as with 500 live points. Is there then (typically/often) no advantage to reducing the number of live points? And more generally, is it possible to do somehow a "quick run" yielding only an approximate best fit and errorbars, to see whether the model can match at all?

  6. I get the following warnings (pointing them out here in case they are easy to fix):

    
    […]/species/fit/fit_model.py:1848: InstrumentationWarning: instrumentor did not find the target function -- not typechecking species.fit.fit_model.FitModel.run_multinest.<locals>.lnprior_multinest

[…]/species/fit/fit_model.py:1887: InstrumentationWarning: instrumentor did not find the target function -- not typechecking species.fit.fit_model.FitModel.run_multinest..lnlike_multinest


7. Currently, only part of the priors, namely the ones set as hard boundaries (through `FitModel`/`bounds=`), are printed to screen ("Prior boundaries"), but neither those nor the ones set through `fit.run_*nest`/`prior={…}` [I see now it is deprecated, but still possible] are stored in the `run*/` folder nor the database. Also the parallax is a prior, set through `database.add_object()`, but that is not indicated as such nor stored either. For completeness, I guess that should be listed.

8. In `FitModel`, I accidentally set a key in `bounds` that is not one of the axes of atmospheric model, which lead to way too many:

KeyError: 'feh' Exception ignored on calling ctypes callback function: <function run..loglike at 0x7fa2440c0f70> Traceback (most recent call last): File "[…]/.local/lib/python3.9/site-packages/pymultinest/run.py", line 228, in loglike return LogLikelihood(cube, ndim, nparams) File "[…]/species/fit/fit_model.py", line 1906, in lnlike_multinest mpi_rank = MPI.COMM_WORLD.Get_rank() File "[…]/species/fit/fit_model.py", line 1161, in lnlike_func

One warning would have been enough :). Or even better: silently ignore the extraneous parameter (or just print one warning)? This way, when trying different atmospheric models, one could always pass all priors as one variable and thus avoid acrobatics to set only a part of them. It also lead to (or maybe it was a coincidence):

~$ head multinest/.txt 0.200000000000000048E-02 -0.139057853353940903-308 0.101724320650100708E+04 0.413902401924133301E+01 0.246439111232757568E+01 0.199086349010467556E+01 0.156033434761241327E+02


Note the missing `E` in ` -0.139057853353940903-308`.

Thanks again!
tomasstolker commented 9 months ago

I think that all the species-related points may have been addressed in the meanwhile.

For setting some of the MultiNest/UltraNest options, you can use the kwargs_multinest and kwargs_ultranest parameters in FitModel. You can have a look at the documentation pages of those packages for more details.

To look at the intermediate sampling results, you can have a look at the output folders from MultiNest/UltraNest.

In general, I would suggest to not add all data at once in case of a high SNR spectrum with broad wavelength range. Try to increase stepwise the complexity of the fit. Typically a fit shouldn't take longer than several minutes.

Point 8: if you leave out a parameter from bounds, then it will automatically set the available range from the grid in case the parameter is mandatory. So in this case I would just not include 'feh'. If you do want to check if the parameter is needed, then you can use get_parameters() or get_points() of ReadModel.

gabrielastro commented 9 months ago

Thanks for your helpful tips!

  1. For the documentation of normal_prior of FitModel: "The parameter is not used if the argument is set to None." → "A linear flat prior betwen the natural bounds is used for the parameters for which a prior was not set explicitly through bounds" or something like this would be more accurate.

  2. Indeed, Ctrl+C will not work for PyMultiNest, as Johannes Buchner explained. Avoiding mistakes is better :wink:.

  3. Thanks!

  4. Indeed, the kwargs_* you added answer this!

  5. Ok, starting with small complexity and increasing sounds good.

  6. I do not get the warning anymore, so, thank you.

  7. If I am not mistaken, the priors (including parameter ranges) used for the fits are not stored in the database. Might this be a good idea, since they do make a difference? While at it, you could maybe add the number of likelihood evaluations or CPU time as a reminder of how expensive a good run was :).

  8. Thanks for pointing to those functions! Sorry for not being clear but I meant that I accidentally set a parameter that is not one of the grid parameters. This could either get ignored or, better, give a single error and stop, instead of keeping on printing the error message. But here again, the best solution is not to pass accidentally unneeded parameters :wink:.

tomasstolker commented 9 months ago

Consider it done! The priors are stored in the database as the bounds and normal_prior groups of the sampling results.

gabrielastro commented 9 months ago

Excellent! Thank you. I will not wait for the next run to confirm it but thank you already :wink:.