tomasstolker / species

Toolkit for atmospheric characterization of directly imaged exoplanets
https://species.readthedocs.io
MIT License
22 stars 10 forks source link

sonora elf-owl as successor to bobcat and cholla #97

Closed wbalmer closed 6 months ago

wbalmer commented 7 months ago

The sonora team released recently their elf-owl model grid to zenodo (T-dwarfs here, but there are also L-dwarf and Y-dwarf entries that are separate, maybe because of file size?). These are supposedly the successors to the bobcat and cholla grids, and vary all the interesting parameters (abundances and Kzz). It would be very useful to have them in species!

tomasstolker commented 7 months ago

The grids are huge! Probably split up for that reason indeed. I will start with the T-type grid, since that one is probably of most interest to you 😉. These are cloud-free grids though.

gabrielastro commented 7 months ago

I would also be very interested! Maybe it would be a good occasion to introduce the possibility of having (at least block-wise) a non-constant resolution in the models? Already for the Sonora family, this would make sense: Sonora-resolutions I had checked a bit and think that the situation was similar for other grids. But sorry if this picture is not accurate!

With a constant resolution, the spectrum files either have (too many) points where there are no data originally, which is problematic if the data have high resolution (the user would generally not know that the Delta lamba of the model files in species is artificially larger) and makes the files larger (that is a minor concern, but still), or one loses data if downsampling. But a non-constant resolution might not be so easy to implement—if it were easy, you would have done it from the beginning :wink:.

When convolving with a non-constant kernel, one can "stretch the data with the inverse scale (i.e., at places where you would want to the Gaussian width to be 0.5 the base width, stretch the data to 2x)" but I am not sure if this can be helpful here everywhere where you deal with the model data. But maybe block-wise constant would be not too much work?

tomasstolker commented 7 months ago

The constant lambda/D_lambda is needed in order to enable fast smoothing. For fitting a specific wavelength regime and/or resolution, you can use add_custom_model method of Database. That should be quite straightforward to use!

tomasstolker commented 7 months ago

The ideal solution would be if modelers provide grids in constant log wavelength sampling instead of the typical constant wavenumber spacing...

gabrielastro commented 7 months ago

I agree that add_custom_model sounds very convenient and easy and look forward to trying. However, I meant actually in part the data distributed with species, that get downloaded from your server. Maybe the Sonora modelers could provide their models at constant resolution but if this is too expensive… Am I mistaken or is the sampling of the Cholla data in species currently too low below 3 µm and too high above (looking at the plot above)?

tomasstolker commented 7 months ago

Yes that could well be! It is important to check if the resolution is sufficient for the data used, see: https://species.readthedocs.io/en/latest/species.data.html#species.data.database.Database.add_model

For example, certain wavelength regimes might be inaccurate for comparing with high-resolution data while still fine at low-resolution or photometry.

tomasstolker commented 7 months ago

The Sonora Owl Elf models have a fixed lambda/d_lambda so I did not need to resample them. For now, I have only included the T-type grid, but could add the Y and L grids later on if needed.

The log(g)=3.0 points were not regularly sampled in the grid, so I simply removed all spectra with log(g)=3.0. I created some plots in the Teff = 800-1000 regime, for which the grid was complete. However, please check when using add_model if any grid points are missing, similar to the log(g)=3.0 points, in which case the grid may need to be adjusted further.

I will upload the grid later today, which will have the tag 'sonora-owlelf-t'.

tomasstolker commented 7 months ago

Hereby an example of the spectra. Some of the changes between grid points are highly non-linear so fitting/interpolating may not always give accurate results. Instead, the CompareSpectra class can also be used for calculating chi2 maps across the grid. I can send an example if needed, since there is not yet a tutorial available.

spectra_co spectra_feh spectra_logg spectra_logkzz spectra_teff

gabrielastro commented 7 months ago

Wow! Very nice. Thank you! The non-linear dependence is probably even stronger at high resolution. Maybe it would be worth it to try contacting Mukherjee et al. about calculating spectra at intermediate parameter values?

wbalmer commented 7 months ago

I was able to download and add the model grid, but initially it would stall and the kernel would die before it finished adding all the grid points to the database. I had to specify a wavelength sampling, e.g.

database.add_model(model=model_choice, 
                   teff_range=(600., 1200.), 
                   wavel_sampling=30000., 
                   wavel_range=(0.75,13.)
                  )

So there might still be an issue with the native sampling, or of corrupt or missing grid points in the current tgz file. I made sure to monitor the memory of my machine while adding the model to the database and I don't think it was an issue with my machine's memory.

After adding the model, the fit worked great! (well, great in that i was able to finish a sampling run and the resulting steps ran normally, none of these models fit my data with reasonable parameters 😭 ).

gabrielastro commented 7 months ago

Hello @wbalmer,

Sorry that no model fits with reasonable parameters… :cry:. Could it be linked to the too-coarse spacing in parameters such as logg or [Fe/H] (at least around 900 K; looking at the plots from Tomas above)? If you go away on either side of the best fit by one model point, does it look like the best fit may have been missed? Do you get the same best fit if you downgrade the resolution of your data?

About the initial stalling, if it were due to corrupt of missing files, it might say Fix missing grid points with a linear interpolation: … and/or Could not interpolateNgrid points so storing zeros instead. [WARNING] The grid points that are missing: …. Actually, these are the error messages from a previous version of species and they might have changed. You can easily do at least a few spot checks by looking by hand (with head and tail at least) at a few random files in […]/species/data/sonora-owlelf-t/ and even check if the correct (same) number of files is present for different values of [Fe/H], say (e.g., ls sonora-owlelf-t/*_feh_0.0*dat | wc -w, for example, varying the feh value).

How did you monitor the memory usage, by the way? (I am not sure how one should do it!)

Maybe this helps at least one of us… Thanks!

Gabriel

tomasstolker commented 7 months ago

Thanks for testing! Adding the full grid will be too memory intensive for a typical machine I think. The TAR file by itself is 36 GB and it will add an array with all spectra at once to the HDF5 database. On my Macbook, with 16 GB memory, I could add a Teff range of 200 K without resampling.

The Sonora Elf Owl grids are cloud-free, so a poor fit probably means that your object is cloudy 😊.

tomasstolker commented 7 months ago

I have added the L- and Y-type grids. For these grids I have also removed the log(g)=3.0 points, since these were not regularly sampled. Below some plots of the Y spectra.

spectra_logg spectra_logkzz spectra_teff spectra_co spectra_feh

gabrielastro commented 7 months ago

Nice! Is the irregular logg spacing (Delta logg = 0.2 / 0.2 / 0.3, cycling) a problem?

tomasstolker commented 7 months ago

And hereby some examples for the L grid. Not sure what happens at C/O=2.5 Probably due to a low H2O abundance.

spectra_co spectra_feh spectra_logg spectra_logkzz spectra_teff

tomasstolker commented 7 months ago

Nice! Is the irregular logg spacing (Delta logg = 0.2 / 0.2 / 0.3, cycling) a problem?

No, the grid is just not complete for log(g) = 3

gabrielastro commented 7 months ago

Ah! ok, thanks. And likewise, the missing C/O=2.0 point is not a problem (variable Delta C/O), I guess?

gabrielastro commented 7 months ago

The big change between C/O=1.5 and =2.5 is similar to the transition at C/O=0.73 in Mollière et al. (2015): Mollière_et_al_2015_Abb_13b

gabrielastro commented 2 months ago

If I may briefly politely re-open this: would it make sense if you provided also a low-resolution resolution of the model grids for when one wants to compute only photometry? Otherwise it is a slow trial-and-error with restricting the range of Teff values or the other parameters, with freezing and crashing when it does not work, and variations from machine to machine. Maybe R = 3000 or 10,000 would be enough for most/all photometry purposes (not sure).

tomasstolker commented 2 months ago

Easiest would be to use the wavel_range and wavel_sampling parameters of add_model. You would still need to download the grid and create the database with the downsampled grid. But you would only need to do that once since the database can be copied or pointed to from the config file.

gabrielastro commented 2 months ago

Ah! right. Thanks for pointing that out. I was actually setting both parameters but did not realise that this would prevent all the models being read into memory at once (which makes sense), i.e., that it should be the solution. Then, my problem was that wavel_sampling = R=10,000 was apparently still too high for the machine I was using. (It is still down so I cannot yet try again :face_in_clouds: but presumably that will solve it.) Thanks!

tomasstolker commented 2 months ago

It will store all spectra into a single array when adding to the database. That could be changed but it is not something I can easily do.

A sampling of 10,000 is quite high. For fitting only (broadband) photometry, I think that ~100 should be sufficient.

gabrielastro commented 2 months ago

It will store all the downsampled spectra, though, right? Then that would not be an issue for the RAM.

For the Sonora Bobcat README, "In our experience, a minimum of 10 wavelength points is necessary to obtain a reasonable average flux over a wavelength interval", so if filters have R ~ 10–100 (broad- to narrow-band photometry), I guess indeed 100–1000 should be enough.

tomasstolker commented 2 months ago

Yes, exactly, the downsampled spectra, so at low sampling resolution that should probably be fine for most/all grids.

gabrielastro commented 1 month ago

It is also related to issue #104 but how can one fit e.g. the SPHERE_YJH spectrum of PDS 70 b (database.add_companion(name='PDS 70 b')) with the sonora-cholla grid, which starts at 1.0 µm? species says Interpolating SPHERE_YJH... [DONE] but then MultiNest prints ValueError: A value in x_new is below the interpolation range., which is understandable…

gabrielastro commented 1 month ago

And about (in species/data/model_data/model_data.json):

    "sonora-elfowl-l": {
        "teff range": [1300, 2400],
…
    "sonora-elfowl-t": {
        "teff range": [575, 1200],

Does this mean that we cannot fit objects with a Teff between 1200 and 1300 K :grin:? Duplicating big data files is not elegant but it would make it possible to fit to cover all Teffs (and doing fits in two parts). Edit: Actually, to avoid edge effects, I guess one should have an overlap of one full grid bin, i.e., have up to e.g. 1300 K in the T-dwarf grid and down to 1200 K in the L-dwarf grid (assuming Delta Teff = 100 K).