tomasstolker / species

Toolkit for atmospheric characterization of directly imaged exoplanets
https://species.readthedocs.io
MIT License
22 stars 10 forks source link

Units in plot_spectrum, upper limits, photometry off model wavelength range, magnitude conversion, and other small things #104

Open gabrielastro opened 2 months ago

gabrielastro commented 2 months ago

Hello Tomas,

I gave myself a little refreshment break from another project and ran species again :smile:, now trying photometry. Here are a few issues (maybe in my understanding!) and suggestions:

1.) About units= of plot_spectrum: the y part needs to be given the units of the flux density, even if the plotted quantity is "flux" = lambda*F_lambda. This is actually practical (one needs to change only quantity, not its units too, if exploring different options) but at first surprising. Actually, could there be a units conversion issue too? I am using quantity='flux', units=("µm","erg s-1 cm-2 Å-1") in plot_spectrum and the result, comparing to a plot from someone else but also to a call without setting the units explicitly, seems to be off by 1 dex from what it should be in erg/s/cm². There seems to be the same problem when adding a spectrum for an object through add_object(…, spectrum=…, units=spec_units) (instead, I changed the units in the file externally to W/m²/µm).

2.) Currently, can it be that one can include a photometric upper limit only through flux_density ("The flux_density can also be used for an upper limit on a photometric flux, by setting the flux to zero and the uncertainty to the 1 sigma upper limit.")? Doing this with a magnitude (using e.g. 99 mag instead of zero flux) leads to plotting problems afterwards, and I am not sure whether it is used for the fit. Also, with flux_density={'filter': (0, onesigmaupplim)}, could the data point automatically be an upper limit symbol (horizontal bar + downward arrow of length 1 sigma)? Currently, it gets treated like the others.

3.) I tried using the AMES-Cond models for SDSS ugriz data. I had forgotten that the AMES-Cond spectra start at 0.5 µm and got the error:

Interpolating SLOAN/SDSS.g... [DONE]
Interpolating SLOAN/SDSS.i... [DONE]
Interpolating SLOAN/SDSS.r... [DONE]
Interpolating SLOAN/SDSS.u...---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
…
IndexError: index 0 is out of bounds for axis 0 with size 0.

Removing the u-band data (lambdaeff = 0.3546 µm) was enough to avoid the error. In hindsight, this is logical but (a) it might be useful to catch these cases at a higher level in species; actually, it could be sensible to just print a warning that the data points will be ignored, but maybe it is better to force the user not to use those data points (to avoid misunderstandings or "bad surprises"). (b) Why was there no problem with the g-band photometry, however, for which lambdaeff = 0.4670 µm < 0.5 µm?

Smaller ones:

4.) Small typo in https://species.readthedocs.io/en/latest/species.data.html#species.data.database.Database.add_object: "to the 1:math:sigma upper limit".

5.) By default, inc_spec is set to True in FitModel() even if the data do not contain spectra but only photometry; the error message is a bit obscure (TypeError: type of the return value must be dict; got NoneType instead, pointing to a line inc_spec = list(self.object.get_spectrum().keys()), from which one can guess what happened). Maybe make a warning and/or set the default to activating inc_spec only if there are such data?

6.) In the tutorial "Fitting data with a grid of model spectra", one now needs tag= in get_residuals(), and datatype='model' and spectrum=… should be removed. Note that for multi_photometry(), the parameter datatype='model' is (still) needed; maybe this is ok or maybe a very slight inconsistency.

7.) In Calculate multi-photometry (but also when preparing for model fitting), when printing the data, order the filters by wavelength within the instruments? It is weird to see the order H, J, Ks or g, i, r, u, z :nerd_face:.

8.) In plot_spectrum, you could allow 'lin' as a synonym for 'linear', and 'logarithmic' for 'log' and make default a "symmetric" pair (for instance "lin"/"log", which is conveniently short yet clear).

9.) When fitting a photosphere and a blackbody component (with 'disk_teff' and 'disk_radius' for bounds= in FitModel()), what is "R_bb" in the corner posterior plot? It is different from R_disk.

10.) Is there an easy way to convert from magnitudes in the AB system (as e.g. SIMBAD yields for SDSS data, at least in the "Basic data" section from which I copy the data by hand into the species script) to the Vega system (as species needs)? These kinds of conversions are always error-prone…

11a.) After adding several photometry for several bands (the normal way: database.add_object(… app_mag=…)), the magnitudes information printed to screen has Mean wavelength (um) = 1.2808e+01 for all bands… :).

11b.) When you correct this, could you also print on the same line the filter width used to calculate "Flux (W m-2 um-1)" (actually, this should be "Flux density)", for convenience? Often one needs actually the filter-integrated flux (e.g. in W/m²), e.g. to compare the possible contribution of an accretion line to the total filter flux.

11c.) For the magnitudes, maybe print three digits after the period because sometimes, the precision is better than 0.01 mag (so currently it gets rounded to 0.00).

12.) Add R_V (ism_red) in legend entry for the model if present in the fitting (through bounds); I looked into the source and it is not quite clear why ism_ext (A_V) does get added but not ism_red. Tiny one: I guess also the variable name prefix ism_ could be made more general… ext_val (for "value") and ext_red? But "ism_" is easy to remember.

13.) Would it be possible (well, any is possible…) and easy to allow (implicit) wildcards (as for the inflation parameters) in the filter names for in the data styles? For instance, in the β Pic b example, to have all photometry from Paranal/NACO or from GPI in the same style, one could do plot_kwargs=[{ …, 'Paranal/NACO': {'marker': 'o', 'color': 'red', …}, 'GPI_': {'marker': 's', 'color': 'blue', …}, …}] instead of listing by hand every filter. An extra nice feature would be to allow one to additionally set by hand one label for one of those filters for the legend (e.g., … 'Paranal/NACO.J': {'label': 'NACO'}, …) to have only one entry in the legend. Obviously, this is just an idea and does not have priority…

Thanks again for this great tool!

Gabriel

gabrielastro commented 2 months ago

14a.) In Calculate residuals, a photometry point is outside the wavelength range of the models, so the flux is set to NaN, which is fine. However, right after (in the residuals part), species prints:

Number of data points = 10
Number of model parameters = 8
Number of fixed parameters = 0
Number of degrees of freedom = 2

chi2 = nan
reduced chi2 = nan

but actually I was fitting only four data points; I just wanted to plot the other data points too. I guess they should not enter the chi2 calculation. Something similar if fitting with more model parameters than data points: one gets a negative number of degrees of freedom :slightly_smiling_face:.

14b.) The second, related issue is that the NaN somehow leads to ValueError: zero-size array to reduction operation maximum which has no identity, coming from max_tmp = np.max(np.abs(residuals.photometry[item][1][finite])) in plot_spectrum.py. Of course, one could set by hand which filters to include through get_residuals(…, inc_phot=…) but that will change depending on the wavelength range covered by the atmospheric models…

Thanks again!

gabrielastro commented 1 month ago

Adding to the list (let me know if you prefer smaller-length "Issues"!):

15.) There is one extra space: In species/data/database.py at line 1560: f" correlation matrix into a covariance "f"correlation matrix into a covariance "

16.) For FitModel, the documentation says that "With bounds=None, automatic priors will be set for all mandatory parameters" but I got an error: argument of type 'NoneType' is not iterable pointing to the if "teff" in self.bounds part of fit_model.py:532. Same thing if doing just fit = FitModel(object_name='myPMC, model='atmo'). At least one parameter (teff or logg or…) currently needs to be set.

17.) About the output directory for fit.run_multinest(tag=…, …, output=…), the "Path that is used for the output files from MultiNest": I did not realise that one needs to put a slash at the end by hand! Otherwise the eight files per run are next to the empty directories (one of them normally hidden because it is called .txt), which is a bit of a mess. And this cannot be changed easily after the fact, right?, because all the names in the database would need to be updated too. Maybe silently (or with a warning) add a slash at the end if the user does not? This is a problem only for MultiNest. By the way, maybe the two functions run_ultranest and run_multinest could be unified to one (with n_or_min_num_live_points) with an option choosing the variant? But just an idea like that…

18.) In another list but I again ran into the issue :frowning_face:: Convert silently resolution in database.add_object() to a float? Currently: type of argument "spec_res" must be one of (float, NoneType); got numpy.int64 instead, which is a bit too strict. Keeping resolution an integer makes it easy to combine it with filenames, for instance, and one usually thinks of integer resolution, not fractional numbers…

19.) Add an option to print (i) the model name and/or (ii) the bounds and/or the (iii) the evidence and/or or some other information on the posterior plot? At least the name would be quite practical. Currently, one half of the page is free for useful information… :smiley:

gabrielastro commented 3 weeks ago

Edit: I added to Nr. 11 above a suggestion to print the filter width, for convenience, and also one digit more to the magnitudes.

gabrielastro commented 2 weeks ago

20.) Still in the category "Small tweaks": for many filters, the peak transmission is much lower than 100% (e.g.: SDSS/z, peaking at 8%), but the transmission subplot always goes to 1, so that the profile can be hard to see. Add the option of automatically adjusting the ymax to the maximum of the visible curves (plus a little margin, of course, to avoid the curve touching the top axis)?