Open TarekHC opened 8 years ago
Hi Tarek, as a first approach, the check on spectra that you're suggesting sounds good to me. However, I am not sure that the feedbacks coming from tests based on simulation only (e.g. CTA) can be as effective as the ones coming from real data (e.g. HESS, MAGIC, VERITAS). For this reason, I would concentrate first on tests performed on real data.
Hi @SaverioLombardi ,
Indeed. Real data will be key here. The experience of HESS and VERITAS will be essential. You know how much I love MC... :blush:
Although we must also validate the DL3 format for CTA, and that requires to run some tests also without real data. It could be relevant to define the length of CTA DL3 bunches (key, for example, to estimate the computing resources for the run-wise MC), to understand how complex will IRF3 be, and things like that... Current generation of IACTs do not have such tight requirements, so we will probably need both.
Well, in HESS, 100 people have done all kinds of studies for over a decade, so I'm sure many studies and attempts to quantify IRF and high-level analysis result precision have been done.
But when it comes to concrete methods or scripts to check the DL3 data and IRFs we produce, there's almost nothing there yet!
At the moment we're comparing high-level source results (position, extension, spectral parameters) for a handful of test sources between different chains / configs / tools. @joleroi has a script that does these batch analyses for target lists with Gammapy, @mimayer has one that does them with ctools, both haven't landed in the public Gammapy or ctools repo yet I think.
I once started a EventListDatasetChecker in Gammapy where I started to compare times and coordinates (RA / DEC / ALT / AZ / DETX / DETY) obtained via Astropy / Gammapy against the HESS software. This is old and outdated, I'm just mentioning as an example of how we could implement common tools to check IACT DL3 data. Both for format and for content, I think Python scripts are the way to go. They should be open-source to be re-usable for all IACTs. I don't care much at the moment if this development happens in Gammapy, ctapipe, Gammalib or in this spec repo or yet another repo. For personal reasons I have a slight preference for Gammapy and pull request there for an IRFChecker (anything, both format and content) are welcome. It should be exposed as a command-line tool also, e.g. called gamma-dl3-police
or something.
As I've mentioned at the Meudon meeting, in HESS we have these index files for each "FITS production", so really what I'm looking for is something that takes such a production (DataStore in Gammapy) and runs a series of checks, producing a text or HTML report about issues.
I think the needs for CTA at the moment are different. There it's more about developing tools that compute required MC statistics and good binnings in parameters to obtain a given precision. Some of the code will be useful for existing IACTs also, but it's also kind of a different study.
So I don't know what this means for us here. Either we continue to chat a bit about it here and we all do our own stuff more or less. Or someone (probably from CTA) sets up a task group on this and tries to organise the work (kick-off f2f meeting, then monthly telcons) ... if you can get enough contributions. In my opinion we can cover format validation in the monthly calls we have now, but when it comes to content validation and IRF checks and data quality, that's a large separate task that we can't cover in one telcon a month.
Also -- every IACT, chain, config and IRF is different. See the backup slides 12 and 13 with examples of IRF issues from HESS here. Some chains in HESS do MC simulations at fixed energies, other continuous. Some configs have gamma-hadron separate in energy and offset bands and then the responses have steps. This is a long-time known issue (well, internally) for HESS spectra without a good solution.
So what I'm saying is: yes, we should develop some methods and tooling to check IRFs. But the re-use factor will be limited, large time investments are needed to produce good, well-understood IRFs for every IACT / chain / config. Is there a task group in CTA to work on this?
Thanks @cdeil !
Also -- every IACT, chain, config and IRF is different. See the backup slides 12 and 13 with examples of IRF issues from HESS here. Some chains in HESS do MC simulations at fixed energies, other continuous. Some configs have gamma-hadron separate in energy and offset bands and then the responses have steps. This is a long-time known issue (well, internally) for HESS spectra without a good solution.
I'm thinking exclusively about validating the format. Different analysis chains would produce different DL3 datasets, and could be compared and cross-validated. But to validate the format itself you probably need methods that proof that you are actually not loosing information in the process.
At the moment we're comparing high-level source results (position, extension, spectral parameters) for a handful of test sources between different chains / configs / tools. @joleroi has a script that does these batch analyses for target lists with Gammapy, @mimayer has one that does them with ctools, both haven't landed in the public Gammapy or ctools repo yet I think.
I was thinking about something like this.
So what I'm saying is: yes, we should develop some methods and tooling to check IRFs. But the re-use factor will be limited, large time investments are needed to produce good, well-understood IRFs for every IACT / chain / config. Is there a task group in CTA to work on this?
Definitely not. And it may be unrealistic to think people will devote their efforts on it. And also the scope of such common tools may be simply too wide.
But, thinking about CTA, the fact that we need to validate the format itself worries me. I guess it may be validated together with the methods that will, in the future, validate the CTA "blessed" science tools.
But, thinking about CTA, the fact that we need to validate the format itself worries me. I guess it may be validated together with the methods that will, in the future, validate the CTA "blessed" science tools.
It's my fault, but we're talking about very different things in this one thread and it's getting confusing.
Should we focus on 1 for now and leave 2 to the future and some other thread?
@TarekHC - You say "the fact that we need to validate the format itself worries me". I think it's completely normal and a need for any format as soon as you have several producers. When HESS or MAGIC or CTA or ... produces FITS data that is supposed to match this DL3 spec, it's nice if we have a validation tool that checks that the format is OK. By format I just mean to check that all required header keywords and columns are present. This could then be extended to simple data type and value checks, like livetime should be a positive floating point number, even if those simple value checks are already moving a little bit towards content validation.
The other tool that would be nice to have would be to throw a FITS file or list of FITS files at it, and it lists or summarises as text on the console what they contain (i.e. which HDUs are of formats describe in this spec and what they contains which aren't). In Gammapy we have started to implement the gammapy-data-show command line tool (mainly used so far to debug exported HESS IRFs). It's not implemented well ... as soon as we have the declarative scheme here with the HDUCLASS2 required, it should go via a registry. At the moment the user has to say what format a given HDU is in (see here).
Also, before we keep discussing further: @jknodlseder - do you think it's useful to talk about "Development of common tools for data format validation" in the issue tracker of this repo? Or do you think we should limit the scope here to format discussions, and move tool discussions over to the Gammalib or Gammapy trackers?
Hi all,
As we discussed in the telcon, we should define a couple of methods for DL3 data format validation. Apart from the use cases that the format will need to cover, we should have methods to estimate the systematic uncertainty we are adding by using certain approximations.
Also, the current generation of IACTs will dedicate some effort to validate their own DL3 products, so there is a nice synergy on this task from which we could profit.
I guess, an obvious example is to fit the measured spectrum of a known source (CrabNebula for MAGIC) with different assumptions considered (adding or removing some IRF dependencies, for example).
In the case of CTA, we could generate event lists from several IRFs with small differences (for example, zenith angle variation expected in ~30') and measure the effect on the calculated spectrum using an averaged IRF.
@cdeil Were similar validations done for HESS? Or just comparison between standard analysis tools and DL3 analysis products?