New seeing databases based on simsee and Pachon DIMM data from 2004-2017

ehneilsen commented 6 years ago

More information can be found at https://github.com/LSSTDESC/obs_strat and data data files at nersc in this directory: /global/project/projectdirs/lsst/survey_sims/

Here are the database files suitable for use by opsim4, posted here at Lynne's request.

simsee_pachon7.db.gz simsee_pachon6.db.gz

ehneilsen commented 6 years ago

The title should read 2004-2017, not 2014-2017

rhiannonlynne commented 6 years ago

@tribeiro note that the obs_strat repo at LSSTDESC is private to the DESC group.

My summary from there and from conversations on slack:
The source of the on-site data that these input files are based on is on Pachon DIMM measurements over 13 years (2004 - 2017). From documentation in one of @ehneilsen 's ipython notebooks (https://github.com/LSSTDESC/obs_strat/blob/master/doc/seeing/Model_Pachon_r0.ipynb):

The data from the DIMM Pachon loaded here was provided to me by Edison Bustos in email on 2018-03-21. It contains time stamps and seeing estimates from Pachon DIMM data. These estimates arise from an equation that assumes Kolmogorov seeing, and are calculated for a wavelongth of 500nm. See equation 5 of Tokovinin 2002 (2002PASP..114.1156T).

The Kolmogorov-interpreted DIMM measurements were acquired roughly every minute, but there are gaps, including periods of several months (it looks like from the plots). (there is a table indicating 1 minute spacing, but the plots are sparser -- the plots are averaged per month, but the overall spacing is uncertain).

First values larger than 5" and smaller than 0.1" were rejected. Then the Kolmogorov FWHM was turned into a Fried parameter (r_0) and used to calculate the von Karman FWHM (FWHM eff?? FWHM geom?? what does FWHM mean in this context?)

raw_dimm['r0'] = 0.98*5e-7/np.radians(raw_dimm.seeing/(60*60))
raw_dimm['log_r0'] = np.log10(raw_dimm.r0)

def calc_FWHM_vK(fwhm_dimm, outer_scale=30, wavelength=5.0e-7):
    r0 = 0.98*wavelength/(np.radians(fwhm_dimm/(60*60)))
    fwhm = fwhm_dimm * np.sqrt(1.0 - 2.813*np.power(r0/outer_scale, 0.356) )
    return fwhm

raw_dimm['vk_seeing'] = calc_FWHM_vK(raw_dimm.seeing, 30)

Note that Bo Xin completed an analysis of the LSST DIMM measurements including simulations with ImSim/PhoSim to include the optics and generate a full PSF. In calculating the delivered PSF in LSST images, he also used a von Karman profile, also with an outer scale of 30m. (see Document-20160 and Document-18208). But he calculated FWHMeff explicitly, as the FWHM of a gaussian that would encompass the same number of pixels as should be used when calculating SNR of a star with the von Karman PSF. (NOTE: for OPSIM inputs, we actually DO want the Kolmogorov DIMM FWHM, as that's what we expect and use to turn into FWHMeff/FWHMgeom in SeeingModel.)

However, the von Karman FWHM and the DIMM / Kolmogorov FWHM do track each other with a predictable relationship.

The other, more important aspect, is how to fill in any gaps in the DIMM measurements and how to extrapolate between the DIMM measurements.

The time series modeling @ehneilsen did assumes that the distributions of measurements are gaussian. From his notebook: Okay, neither [r_0 or log(r_0)] are really Gaussian, but $\log(r_{0})$ is closer to normal than $r_{0}$. One notable feature is the low-value tail of $log(r_0)$. This suggests that there is a poor-seeing tail of the distribution that my model will not be able to model well. (I think this is important to note, because a poor seeing tail could be important. It's not clear to me how many observations would fall into this tail.)

Next, the time series (of all vaues) is resampled onto a 5 minute intervals using the nearest values. This resampled series is investigated for variations within a night (first half/last half, twilight, etc) and for dependence on the sun's altitude.

dependence on the time of night seems to exist, but hard to understand (present in some months, not all). will not be modeled here.
clear difference in monthly average (log r_0) summer vs. winter
also looked at difference in night of the year (should be related to the monthly average) - variation can be seen, and there is a correlation in the nightly mean value between nights (autocorrelation function shows long-term oscillations).

Fit a simple sine function to the average nightly values. From notebook: simple sine, fit to the data using simple linear regress and the harmonic addition theorem. This fit is not perfect, but it grabs a lot of the seasonal variation.

(better on the average monthly data than the nightly data).

Much of the autocorrelation of the nightly log r_0 values goes away.

After removing the overall seasonal trends with this fit, the residuals are used to fit a generalized damped random walk model (autoregressive -- AR -- model)-- first on the nightly means, and then on the data within a night. AR fit was done with from statsmodels.tsa.arima_model import AR Since there are non-continuous sequences, these fits were done on the longest continuous, evenly spaced sequence possible.

For the nightly means, the average result for the AR parameters was: So, for our regressive model, we need innovations with a standard deviation of 0.08, and and L1 term of 0.2. (constant term averaged 0, although with a standard deviation of ~1.6).

For the variations within a night (already resampled to 5 minute intervals), the results look pretty similar for all nights throughout a year. We need an innovation with a standard deviation of 0.05 and and L1 term of 0.7. (here constant term is 0, with very small standard deviation).

Remember that these fits are based on log r_0, not directly the FWHM values.

These fits are then used to generate the seeing databases attached above by: reading processed DIMM data (which has already-calculated r0 values) if a datapoint is in the DIMM data, use that for datapoints between DIMM data, generate samples using the AR fits.

It's not clear to me how the damped random walk is made to be consistent between the interpolated points, but that may be because I don't fully understand the damped random walk stuff.

It's worth pointing out here that Bo Xin has a paper to be submitted on the PSF measured from SDSS. In it, he finds a structure function-like quantity is a better predictor of the FWHM (ends up being similar to damped random walk, but with different exponents I think). From the conclusions of the paper: The power spectrum of the temporal behavior is found to be broadly consistent with a damped random walk model with characteristic timescale in the range ∼ 5 − 30 minutes, though data show a shallower high-frequency behavior. The high-frequency behavior can be quantitatively described by a single power law with index in the range −1.5 to −1.0. A hybrid model is likely needed to fully capture both the low-frequency and high-frequency behavior of the temporal variations of atmospheric seeing. This was only for variations in the seeing within a night.

My takeaways: Perhaps we should remodel the opsim inputs to use r0 instead of DIMM FWHM. But at least, we need to understand what the "DIMM FWHM" values we're using are (especially if they are interpreted using a Kolmogorov or von Karman model). Currently sims_seeingModel expects these to be Kolmogorov models. I believe the 'FWHM' values in these databases are von Karman models. We could reconfigure them to match Kolmogorov values.

We should look directly at the DIMM measurements as well. I'd like to know what the gaps looked like. We should also check if there is a difference between those that LSST systems engineering would endorse and the ones used in Eric's model.

We should investigate the poor seeing tails of the distribution -- how much worse is the seeing likely to get compared to the model here? If the gaps in the DIMM are small/infrequent enough, most likely this tail is not that important.

We need to check the mean/median/tails of this distribution compared to the previous distribution, and compared to mean values being used for system evaluation (i.e. fiducial m5 values).

That said, I do think we need to move to something like this shortly. The seasonal variations are important. (note that the overall scale of the FWHMeff here is a little hard to interpret, given that I don't know how FWHMeff as-calculated with the "remodeled seeing" was calculated .. if it used our standard seeingModel, then the input is not quite right --- but the overall fact that there is a trend we're not including, is important).

rhiannonlynne commented 6 years ago

I had a brief email conversation with Bo Xin about what went into the previous FWHM500 values. He felt that the outer scale was incorporated -- which suggests that the previous FWHM500 values were indeed von Karman - interpreted values.

ivezic commented 6 years ago

It would be good to get to the bottom of a rumor that Eric N. found worse seeing distribution with the latest data than Chuck's initial measurements quoted in the SRD.

ehneilsen commented 6 years ago

Yes, I agree it would be good to understand why there is a difference. I have not been able to track down exactly which data either the SRD data or the default opsim4 seeing database came from, or what model was used. That there is a difference is not very surprising, though. There are multi-year trends apparent in the "raw" DIMM data: monthly_raw_dimm

Note that the DIMM data is just as supplied by Edison Bustos, and uses a Kolmogorov model to derive the FWHM, so the numbers aren't directly comparable. However, the trends should be the same. If the original estimates were made based on data from 2006-2012, I would not be at all surprised if it were better than estimates that also included data from 2004-2005 or 2013 and later.

ehneilsen commented 6 years ago

To derive the FWHM values, I use equations 5 and 19 of Tokovinin (2002) (2002PASP..114.1156T). I have not included some of the other corrections described in that paper, as I do not know the details of the Pachon DIMM (eg exposure time and read-out noise) well enough, and these might lead to my over-estimating the FWHM.

ehneilsen commented 4 years ago

Here are a new round of simulated seeing opsim databases, after fitting to the latest DIMM data and applying some updates to the simsee code. simsee_pachon_58771_16.db.gz simsee_pachon_58771_13.db.gz

ehneilsen commented 4 years ago

These are another update round of seeing simulations, the same as a week ago (simsee_pachon_58771_*) but 9 months longer (for simsee_pachon_48777_13.db.gz) or 3 years, 9 months longer (for simsee_pachon_58777_16.db.gz). simsee_pachon_58777_16.db.gz simsee_pachon_58777_13.db.gz

rhiannonlynne commented 4 years ago

https://docs.google.com/presentation/d/1GEazuX-iKM1IttHlo5S3s7pyVkWfk0EVEDvhMlbaDjA/edit#slide=id.g6b3c16542c_0_91

rhiannonlynne commented 4 years ago

I will adopt simsee_pachon_4877_13 as the default database for now. The _16 database is slightly longer, but places the worst weather year in the last year of the survey; this will be really useful to simulate, but I suspect that having a worse weather year earlier would be better for understanding survey strategy (particularly if we want to adjust to the worse weather).

rhiannonlynne commented 4 years ago

simsee_short_summary.pdf

rhiannonlynne commented 4 years ago

New seeing database added as default and new example added: https://github.com/lsst/sims_seeingModel/pull/5

lsst-sims / legacy_sims_seeingModel

New seeing databases based on simsee and Pachon DIMM data from 2004-2017 #2