Include systematics in flux error margins

timstaley commented 9 years ago

cf https://redmine.astron.nl/lofar_issuetracker/issues/5679

Currently we only record the estimated errors from the Gaussian fit to a source profile. However, we know that due to calibration / ionosphere variations etc. etc. absolute fluxes can be out by as much as 5-10%. We should add a configuration parameter to inflate errors accordingly, much as we already do for positional errors (ew_sys_err, ns_sys_err).

jbroderick1 commented 9 years ago

I think the easiest way to do this is using a configuration parameter, as you suggest.

Although it would be great if TraP could somehow work this out from the median fractional variability using all the sources in a given dataset. At least that is what I do for e.g. the SS433 field.

AntoniaR commented 9 years ago

I do not think this should be included in Release 2.1.

There are a lot of potential problems with the variability statistics that could appear by implementing this. I think we need to do more testing to be confident that we are doing the right thing - it seems to be right but we need to be certain. Unless someone is willing to also look into the statistics and do this type of testing on the timescale of release 2.1 (I would be willing to investigate but not on that timescale), I would recommend pushing back to release 3.0.

timstaley commented 9 years ago

No need for alarm - with what I have in mind, you will just be able to set this to 0 (switch it off) in the configs... can discuss Thursday.

AntoniaR commented 9 years ago

That is absolutely how to implement it, however I still think this should not be put into TraP until we have thought about it more. There is no point in implementing something quickly that can easily get misused and may be the incorrect way of dealing with the issue.

Here are a number of the issues that I'm concerned about from the top of my head...:

should the error just simply be a fixed % of the flux and added to the observed error?
should the error be a fixed % of the flux and added in quadrature to the other errors from PySE?
should it be a fixed % or something flux/field/calibration/imaging/frequency dependent?
should TraP automatically figure it out on a dataset level as Jess suggests?
should we actually try to quantify it analytically - i.e. get a better understanding of the origin of the uncertainties and then base it on that? e.g. quantify the calibration component, the ionospheric component, etc...
is there a different solution which does not try to correct for this but provides a work-around?

Basically the flux uncertainties are extremely difficult to quantify and can depend on many different things which are dataset dependent. It could potentially lead to statistical issues with the population of source flux errors - just the thing we want to use for variability analysis... - so it needs checking. In the ideal world we should do option 5.

About a year ago, I did some very preliminary work for this but stopped because it was not as simple as adding a % and there was a much simpler option - I am now favouring option 6. The ultimate aim of the pipeline is to identify which sources are variable from those which are not. Using the plots of variability parameters against flux (see my paper) it is extremely easy to identify the trend that stable sources have with respect to flux, and then it is simply a case of how you query the database. This means we do not have to try to quantify errors which can be extremely difficult to do.

This issue is very important as we really do need to think about the quoted errors but this requires a lot more effort. That is why I think this should not go into Release 2.1, unless there is someone who can dedicate time to this investigation within the timescales of Release 2.1.

timstaley commented 9 years ago

Ok, let me set out the solution I had in mind:

I was thinking of implementing two parameters. One is a multiplicative error - a fraction of a source's flux - due to flux calibration uncertainties (fairly standard error source for radio, no? I think it's quite well understood for e.g. AMI, though I'm less sure about LOFAR). This is added in quadrature. The other would be an additional 'baseline error', basically an additional constant that is added in quadrature. There's no hard reasoning for the latter, other than I figured I might as well implement it at the same time in case in comes in handy - thought I'd suggest it and see if anyone else think's it's a good idea - but we can/will leave it out if no-one sees a good reason for it.

So, in the config file

#Systematic calibration uncertainty:
frac_flux_sys_err = 0.1 ; (Default setting is 0.0, consistent w/ R2.0. Dimensionless - proportion of flux.)
# and if we decide it's a good idea also...
additive_flux_sys_err = 0.0 ; (Default setting is 0.0, consistent w/ R2.0. Units of Janskys)

And in the code

flux_err_total = sqrt( flux_err_fit**2 + (flux * frac_flux_sys_err)**2 
                                                        + additive_flux_sys_err**2)

I would not suggest any automatic calculations - it's up to the user-scientists to decide what the right values should be for a given dataset. (Though e.g. Jess might run his analysis first to decide how this should be set). Ideally the scientist's decision is informed by analytic error estimates for their telescope of choice. Note that the config values will be recorded in the database (that reminds me - another Banana enhancement would be to display these values EDIT: https://github.com/transientskp/banana/issues/75).

This should be used with care, but I think that not implementing it because we don't understand the LOFAR errors well enough yet is counter-productive - we already know that we're not correctly allowing for systematics in variability calculations for e.g. AMI data, so this would move us closer to being 'analytically correct'.

I agree it's worth considering alternative implementations though. Ideally we would store the flux error from the fit only (flux_err_fit above) separately so it's readily available for those who want to check it. We could certainly do that if we change the schema to add an extra column for each flux (peak, integrated). Alternatively, you could conceivably just store these parameters per-dataset and use them in an alternative set of variability calculations. That might be pretty neat actually, since you win by not storing any additional data, and still use the full error estimate in the variability calcs - but would require significant re-engineering of the database code, so it's not going to happen anytime soon.

@jbroderick1, I think you requested this feature, and you're much better versed in radio data than me - can you have a read through and see if the above makes sense?

jbroderick1 commented 9 years ago

I like the solution that is proposed above.

I fully agree that we don't understand LOFAR flux density uncertainties, and a lot of my attempts are almost pure guesswork at sensible values (e.g. for the redback RSM and SS433 papers).

However, from my (very limited) use of the TraP, I worry about certain cases where sources have extremely high eta_nu values, and then I look at the light curve and the fluxes might be e.g. 99.7,99.6,99.7,99.8,99.5 Jy etc etc. Because the calibration error is not being considered, can you really trust the resulting variability statistics? Apart from eta_nu, my understanding is that the fractional variability should also be debiased for the errors too (e.g. Martin's MWA paper)?

I think the way that the big radio surveys have done this in the past is to monitor the flux densities of the secondary calibrators, calculate the standard deviations, and then take the average of the set (after throwing away the sources that vary a lot intrinsically). I've sort of followed this approach in my LOFAR papers, looking at reasonably bright sources and seeing how stable they are.

To start off with, including all the uncertainties within one (or two) terms is okay, I think.

I would like to see this implemented asap, especially if it is an option where the user can switch it on and off.

AntoniaR commented 9 years ago

Originally, when I posted the issue, I would've completely agreed with everything discussed above.

I have actually implemented a % uncertainty in some tests on the variability parameters about a year ago after I filed this issue and it does not work as expected. I find it is far, far easier to interpret the variability results, for instance using the plots I make, where you accept there is a flux dependence. Indeed, no sources should be identified as variable by using eta alone - we need to use both variability parameters and you will quickly spot that the V parameter for a source at 99 Jy is a significantly below the average population (often >1sigma lower the mean V, typically an order of magnitude). So even without taking into account the flux dependence on the parameters, you can easily distinguish real variables from the population by using cuts on both eta and V (and this is what TraP is designed for...).

Personally, I would say it is better to have a known unknown (i.e. we can see the correlation with flux and correct for it in the variability analysis) than what would rapidly become an unknown unknown as we'd be hiding issues behind a fixed value which may or may not be accurately describing the true situation.

From discussions I've had with Martin (he spends a lot of time thinking about flux uncertainties), it is really not clear to me which method (correct observed errors or model the flux dependence variability parameters) gives the best result. But it is very clear that trying to correct the errors is a big problem as over/under estimating them can play havoc with the variability statistics of the whole population. So if we're going to correct them, we need to be convinced that we're doing it right. The method outlined above is potentially the right way, but I think we need to be able to justify it before it becomes part of an official release - either with observational data showing that the variability statistics behave as they should when it is applied or by analytically figuring out what the correction should be.

Ultimately, I would be very happy if this is implemented in a separate branch for people to thoroughly test (and that would probably help answer a lot of my concerns) but, as you can probably guess :), I am not happy with it going into an official release yet.

AntoniaR commented 2 months ago

This is an ongoing wish list item but not essential for R7. Moving to the long term milestone.

HannoSpreeuw commented 2 months ago

PySE's methodology of taking account of calibration errors and biases has been copied from the NVSS paper.

So what is implemented wrt positional uncertainties mimics equations 27 of that paper, with $\epsilon_{\alpha}$ mapped onto eps_ra while $\epsilon_{\delta}$ is represented by eps_dec. The NVSS paper says:

These image offsets remain in the raw NVSS catalog, but they have been removed 
from the NVSS user-catalog positions

We have not removed any offsets (=biases) so they are propagated through ew_sys_err and ns_sys_err. These terms refer to east-west systematic error and north-south systematic error, see commit 1163684.

Similarly, equation 43 from the NVSS paper, which carries both calibration uncertainties and biases for the peak spectral brightness, is represented by the formula voor errorpeaksq

        errorpeaksq = ((self.frac_flux_cal_error * peak) ** 2 +
                       self.clean_bias_error ** 2 +
                       2. * peak ** 2 / rho_sq3)

To me it seems that the systematic flux errors required here are provided unless the NVSS methodology does not suffice for our needs. Please comment if this is the case.

However, in the current PySE all these calibration uncertainties and biases have to be added in the code, i.e. they are not part of an the ImageData class instantiation, which is the primary contact point for interaction with images. Surely, this is not very convenient.

What we should do is to add quantities like frac_flux_cal_error and clean_bias_error to the job template and make sure they are propagated to the ImageData class and its methods.

Secondly, as part of the eScience Software & Sustainability plan for PySE, we want to accommodate for configuration files for a standalone PySE, similar to job_params.cfg. These configuration files are needed, because the number of possible command line arguments is already pretty large.

For the latter reason, I will turn this into a PySE issue.

transientskp / pyse

Include systematics in flux error margins #71