open-gamma-ray-astro / gamma-astro-data-formats

Data formats for gamma-ray astronomy
https://gamma-astro-data-formats.readthedocs.io
Creative Commons Attribution 4.0 International
29 stars 27 forks source link

Energy true/estimated binning in "aeff_2d" #35

Closed TarekHC closed 8 years ago

TarekHC commented 8 years ago

Hi all,

Aren't we currently "imposing" effective area vs true energy to have the same binning as the effective area vs estimated energy? Isn't that completely misleading?

I would propose to add another pair of columns defining the true energy binning (even if its identical).

I edit it further: The energy column is always labelled as "Energy axis". I would propose to always specify to which energy we are referring to. Specially in the case of IRFs.

jknodlseder commented 8 years ago

Good point.

But if we do that, the question is whether we should not simply store that information in another extension (or have the energy type as a keyword)?

Another question if we should formally support this way of approximating the energy dispersion?

TarekHC commented 8 years ago

Hi @jknodlseder!

One quick question: Does ctools currently use both effective areas?

We are providing the energy dispersion, so going from TENERGY -> RENERGY is trivial. Couldn't we just provide the effective area vs true energy?

Again, we could go even further. We could multiply the mig. matrix with the effective area in bins of estimated energy to create the gamma ray acceptance as a function of both true and estimated energy. That would allow us to reduce (marginally) IRF3 volume (as the mig matrix would actually be within the effective area matrix).

Although this last idea (coming from Abelardo here at IFAE) would probably not be similar to current standards... And probably our current approach is more organized.

jknodlseder commented 8 years ago

No, ctools does always use the true effective area and handles energy migration correctly.

I'm wondering whether we loose in fact information in the approach you propose, because you need to do a convolution and not a multiplication.

TarekHC commented 8 years ago

If we only use effective area vs true energy, then why should we add the other one? I would remove it then.

I think there is no loss of information. In fact, I think MAGIC in some steps of the analysis does something similar. From the integral of each slice you get the effective area of that bin, and scaling the gaussian with that value will give you back the mig. matrix. But again, that is probably a bit misleading also...

cdeil commented 8 years ago

@TarekHC - Thanks for bringing up the issue.

This is a tricky point and we need to find a good solution and document it well.

How does Fermi-LAT support the analysis case where energy resolution isn't applied by the ST in their IRFs?

If AEFF versus ETRUE is used, this results in a shift of ~ 5 - 10 % in flux (or equivalently energy scale) for constant AEFF and power-law spectrum, depending on the energy resolution and source spectral index. So Fermi-LAT must do something to handle this, no?

woodmd commented 8 years ago

We've documented how the STs currently handle energy dispersion on this page:

http://fermi.gsfc.nasa.gov/ssc/data/analysis/documentation/Pass8_edisp_usage.html

As you point out when the energy dispersion correction is disabled there is a 5-10% systematic shift of the measured flux. This is why we currently recommend to enable the energy dispersion correction to minimize systematic uncertainties (it's turned on by default in fermipy).

It would probably be better to have some sort of first-order correction along the lines of what Tarek is proposing so that you can minimize the systematic error even when dispersion correction is disabled. Using an acceptance for Index=2.0 would probably give you something accurate to 2-3% level above 100 MeV. Since we already provide a mechanism to handle energy dispersion correctly I don't think adding this functionality is currently a big priority for us.

I haven't followed the discussions on data formats but with respect to the original question I would think that you definitely want different binning in true vs reconstructed energy. At a minimum the true energy binning needs to be wider than the binning in reconstructed energy to account for events that migrate into your reconstructed energy range.

TarekHC commented 8 years ago

Hi @woodmd! Thinking about current generation of IACTs energy resolution, and future CTA requirements on systematics, this is definitely an issue...

So then there seems to be several simple options to modify the current effective area format (excluding the mig. matrix + acceptance):

In principle, I would go for the first one. @jknodlseder, is it really not necessary within ctools? @cdeil, what do you think? Do you actually use it anywhere (with HESS/gammapy)?

Although as a more general comment, I would still make the effort of being more precise in the description of the energy in each IRF in the documentation. Maybe it could be an action item to add to the ones of the f2f meeting, that I could take care of (if you all agree, of course).

jknodlseder commented 8 years ago

I agree that we need to think about correcting for the systematic uncertainty.

The question is whether we should do this in the data or in the software (doing it simply in software avoids storing the information, but makes the software more complicated). It would be good to understand how the information is in fact generated, i.e. whether the ERECO effective area comes easily out from the MC pipelines, or whether some complex computation is needed. If it's a natural byproduct of the MC I would be in favor of storing it, if complex computations are needed one could decide to leave this to the Science Tools software.

I don't like however the idea of adding a separate ERECO column in the same HDU, as this breaks the structure of the data format and creates a special case that will be difficult to handle. We would have:

I'm not sure that we need a different energy binning for ERECO (just interpreting the energy axis differently does pretty much the trick, and we can always add a few bins to make the total energy range wide enough).

jknodlseder commented 8 years ago

Should add: the current ROOT MC files in CTA have a table for ETRUE and another table for ERECO, hence both informations fall for the moment out of the MC pipeline. Is this also happening for other instruments?

TarekHC commented 8 years ago

I agree. Adding the ERECO in the same HDU may be even more misleading. So we can remove that option.

Should add: the current ROOT MC files in CTA have a table for ETRUE and another table for ERECO, hence both informations fall for the moment out of the MC pipeline. Is this also happening for other instruments?

But if it is not used by the analysis, why generating them? And we should minimize duplicated information. In the future, the analysis will improve in different directions, so I guess those IRF products should be generated at the analysis level, when those improvements are implemented. I have to think about this further, but for now, I see no reason why to add the EffArea vs ERECO.

cdeil commented 8 years ago

I know that we have extensively discussed this in HESS, but I can't find the issue or minutes with the conclusions now. I'll try to summarise here.

For reference, the current version of the spec is here.

In HESS we produce four different effective area IRFs out of MC events:

For the energy binning we use the same one for reco and true energy, and like @jknodlseder I'm 👎 on introducing new columns for those separately in one HDU (too complex), although I'm not sure the alternative I'll suggest below (separate HDUs and a header key flag) is better.

So now the question becomes which AEFFs we want to support. I think getting rid of the point-like versions at DL3 is acceptable. Those are in use a lot now, but they can be computed from the full enclosure AEFF and the PSF, we have an open issue in the HESS-internal tracker that consistency here should be checked.

So that leaves two versions, like the EFFAREA and EFFAREA_RECO we have now. I think we want to keep EFFAREA_RECO as the standard way to run analyses taking EDISP into account in an approximate way, without having to apply EDISP in the science tools (one less inner loop, fast and no issues with low stats MC production where EDISP is very noisy / inaccurate).

Still, the format we have now is non-optimal. It should be better explained why two versions are useful, and providing either should be optional. Maybe it's best to just introduce a header flag whether the AEFF is as a function of reco or true energy, and to split the two AEFF out into separate HDUs?

In Gammapy we currently have a flag which version the user wants to read (see here). But this is not yet exposed to the end-user scripts and there's no internal logic yet when to load and use EDISP (currently we always use the EFFAREA column and apply EDISP). I.e. this is one of the things we urgently want to have described well in the spec and then fixed in Gammapy to support the faster analysis using EFFAREA_RECO (cc @joleroi ).

@gernotmaier or @kosack - Any comments on which AEFF versions you want to produce for CTA and that should be reflected in the DL3 spec?

cdeil commented 8 years ago

PS: this is another example of something where I think a Github issue won't work well, because the issue is a bit too complex.

It requires someone collecting info what current IACTs do, talking to MC and pipe people from CTA what they want to do, think about if / how it affects other IRFs (e.g. is EDISP only from events inside RAD_MAX if point-source AEFF are used?) and then write down a one or two page proposal explaining the options how to handle it in DL3 with pros and cons and present it at a monthly telcon.

Of course, in the meantime, pull requests just clarifying why there are two columns and how they should be filled and used in the current spec are welcome and I'll merge them, to have a better spec for the upcoming HESS data release. The understanding is that this is a very preliminary version, a starting point for discussion.

TarekHC commented 8 years ago

Agree that adding additional energy columns to the HDU is a bad idea. Dropped.

But is there any reason why both should be present? I understand there should be a decision on which one to include, but I would find more reasonable if only one AEFF is present within each DL3 file. And also, not using ENERGY indistinctly for true/reco energy...

@cdeil: maybe we could define a label for these cases in which further debate/research is required. One person could volunteer to be the assignee and present a couple of slides on the topic during Seevogh telcons. If they are labelled, we can list the open issues that need discussion and push for that discussion to occur.

cdeil commented 8 years ago

But is there any reason why both should be present? I understand there should be a decision on which one to include, but I would find more reasonable if only one AEFF is present within each DL3 file. And also, not using ENERGY indistinctly for true/reco energy...

The first question is whether there is agreement that the use case for AEFF_RECO is strong enough for it to remain in the spec (as a way to represent combined AEFF and EDISP for the cases where one doesn't want to use separate EDISP, e.g. because of low MC stats or to have faster likelihood computation). I think the answer to this one is yes, it has been used e.g. in HESS for the past 10 years, so it should be possible with open-source tools also.

The next question is how to represent it in FITS. As I said in the last comment, there I think a separate HDU and flag in the header would be simplest.

Is it clearer now?

@cdeil: maybe we could define a label for these cases in which further debate/research is required. One person could volunteer to be the assignee and present a couple of slides on the topic during Seevogh telcons. If they are labelled, we can list the open issues that need discussion and push for that discussion to occur.

Good idea. How about this as a start? https://github.com/open-gamma-ray-astro/gamma-astro-data-formats/labels

cdeil commented 8 years ago

This issue is now labeled "type: change request" and "status: needs detailed proposal".

TarekHC commented 8 years ago

The next question is how to represent it in FITS. As I said in the last comment, there I think a separate HDU and flag in the header would be simplest.

Is it clearer now?

Just to make sure I understand what you mean: You propose to add an additional HDU for the AEFF vs ERECO, in addition to the HDU of the AEFF vs ENERGY? I prefer this idea than current "aeff_2d" (it does not impose the same binning) but I would push for a decision, and add only one AEFF.

Good idea. How about this as a start?

https://github.com/open-gamma-ray-astro/gamma-astro-data-formats/labels

Good!

cdeil commented 8 years ago

You propose to add an additional HDU for the AEFF vs ERECO, in addition to the HDU of the AEFF vs ENERGY?

Yes. And I think that was also what others in HESS thought when we discussed this a while ago. And a header key has to be added to declare which AEFF variant it is.

PS: We are not in the phase where format changes need to undergo long review and a decision by a board. @TarekHC - If you make a pull request now and others here think it's an improvement and no-one complains within a few days, I'll merge.

TarekHC commented 8 years ago

Yes. And I think that was also what others in HESS thought when we discussed this a while ago. And a header key has to be added to declare which AEFF variant it is.

If we add two AEFF to the same FITS files, I would not differentiate them (only) with a keyword. I would change the EXTNAME to clearly show the difference between both AEFF.

PS: We are not in the phase where format changes need to undergo long review and a decision by a board. @TarekHC - If you make a pull request now and others here think it's an improvement and no-one complains within a few days, I'll merge.

Ok. I'll try to write a proposal this week and pull request. We can discuss if the changes are reasonable there.

cdeil commented 8 years ago

If we add two AEFF to the same FITS files, I would not differentiate them (only) with a keyword. I would change the EXTNAME to clearly show the difference between both AEFF. I'll try to write a proposal this week and pull request. We can discuss if the changes are reasonable there.

You can make a recommendation for EXTNAME for the two HDU, but not make it a requirement. (whereas the header keyword can be a requirement)

If you were to make some EXTNAME required you would open up the much larger and longer discussion of how to do EVENT-IRF association that we had at the f2f meeting without conclusions, so IMO it's best to avoid this here.

GernotMaier commented 8 years ago

I agree that we should keep both effective areas (EFFAREA and EFFAREA_RECO): it really simplifies the analysis

On 17 Apr 2016, at 18:56, Christoph Deil notifications@github.com wrote:

I know that we have extensively discussed this in HESS, but I can't find the issue or minutes with the conclusions now. I'll try to summarise here.

For reference, the current version of the spec is here.

In HESS we produce four different effective area IRFs out of MC events:

• Binned in reco energy and true energy • With and without a RAD_MAX cut applied (point-like and full enclosure effective areas) For the energy binning we use the same one for reco and true energy, and like @jknodlseder I'm 👎 on introducing new columns for those separately in one HDU (too complex), although I'm not sure the alternative I'll suggest below (separate HDUs and a header key flag) is better.

So now the question becomes which AEFFs we want to support. I think getting rid of the point-like versions at DL3 is acceptable. Those are in use a lot now, but they can be computed from the full enclosure AEFF and the PSF, we have an open issue in the HESS-internal tracker that consistency here should be checked.

So that leaves two versions, like the EFFAREA and EFFAREA_RECO we have now. I think we want to keep EFFAREA_RECO as the standard way to run analyses taking EDISP into account in an approximate way, without having to apply EDISP in the science tools (one less inner loop, fast and no issues with low stats MC production where EDISP is very noisy / inaccurate).

Still, the format we have now is non-optimal. It should be better explained why two versions are useful, and providing either should be optional. Maybe it's best to just introduce a header flag whether the AEFF is as a function of reco or true energy, and to split the two AEFF out into separate HDUs?

In Gammapy we currently have a flag which version the user wants to read (see here). But this is not yet exposed to the end-user scripts and there's no internal logic yet when to load and use EDISP (currently we always use the EFFAREA column and apply EDISP). I.e. this is one of the things we urgently want to have described well in the spec and then fixed in Gammapy to support the faster analysis using EFFAREA_RECO (cc @joleroi ).

@gernotmaier or @kosack - Any comments on which AEFF versions you want to produce for CTA and that should be reflected in the DL3 spec?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub


Dr Gernot Maier Deutsches Elektronensynchrotron (DESY) Platanenallee 6, D-15738 Zeuthen, Germany

jknodlseder commented 8 years ago

Concerning names for that, we have so for each calibration data HDU the EXTNAME (the one you see when you open a FITS file) and the HDUCLASS2 keyword (corresponding also to the CAL_CNAM keyword in the index file that matches the CCNMxxxx keyword in the HDU (this is from OGIP, CAL/GEN/92-008).

So far we have:

These definitions comes from OGIP (see CAL/GEN/92-019). The EXTNAME is suggested, not required.

What is different between the two variants of the format is the meaning of energy. As far as I can see, OGIP always uses "ENERG" for true energy, hence I would also adopt this here (actually, our format seems to violate this for the energy dispersion extension where "ETRUE" is used; we probably should change this for consistency).

If would propose to use "ERECO" consistently for reconstructed energies. I don’t think that there is an OGIP standard for this (which is more focused on X-rays where this corresponds basically to detector channels).

So we could in principle detect the type of effective area by the column names.

I guess it would however also be good to change the EXTNAME and HDUCLASS2 (and CAL_CNAM) keywords to avoid any confusion. A suggestion would be:

We also need to think about whether we actually want to change the name of the effective area columns (so far: "EFFAREA"). Since the meaning of the information does not change (both are effective area), I would in fact suggest to keep the column name.

So to summarize, my proposal would be to have for true energy:

And for reconstructed energy:

Le 18 avr. 2016 à 11:13, GernotMaier notifications@github.com a écrit :

I agree that we should keep both effective areas (EFFAREA and EFFAREA_RECO): it really simplifies the analysis

On 17 Apr 2016, at 18:56, Christoph Deil notifications@github.com wrote:

I know that we have extensively discussed this in HESS, but I can't find the issue or minutes with the conclusions now. I'll try to summarise here.

For reference, the current version of the spec is here.

In HESS we produce four different effective area IRFs out of MC events:

• Binned in reco energy and true energy • With and without a RAD_MAX cut applied (point-like and full enclosure effective areas) For the energy binning we use the same one for reco and true energy, and like @jknodlseder I'm 👎 on introducing new columns for those separately in one HDU (too complex), although I'm not sure the alternative I'll suggest below (separate HDUs and a header key flag) is better.

So now the question becomes which AEFFs we want to support. I think getting rid of the point-like versions at DL3 is acceptable. Those are in use a lot now, but they can be computed from the full enclosure AEFF and the PSF, we have an open issue in the HESS-internal tracker that consistency here should be checked.

So that leaves two versions, like the EFFAREA and EFFAREA_RECO we have now. I think we want to keep EFFAREA_RECO as the standard way to run analyses taking EDISP into account in an approximate way, without having to apply EDISP in the science tools (one less inner loop, fast and no issues with low stats MC production where EDISP is very noisy / inaccurate).

Still, the format we have now is non-optimal. It should be better explained why two versions are useful, and providing either should be optional. Maybe it's best to just introduce a header flag whether the AEFF is as a function of reco or true energy, and to split the two AEFF out into separate HDUs?

In Gammapy we currently have a flag which version the user wants to read (see here). But this is not yet exposed to the end-user scripts and there's no internal logic yet when to load and use EDISP (currently we always use the EFFAREA column and apply EDISP). I.e. this is one of the things we urgently want to have described well in the spec and then fixed in Gammapy to support the faster analysis using EFFAREA_RECO (cc @joleroi ).

@gernotmaier or @kosack - Any comments on which AEFF versions you want to produce for CTA and that should be reflected in the DL3 spec?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub


Dr Gernot Maier Deutsches Elektronensynchrotron (DESY) Platanenallee 6, D-15738 Zeuthen, Germany

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/open-gamma-ray-astro/gamma-astro-data-formats/issues/35#issuecomment-211289423

TarekHC commented 8 years ago

If would propose to use "ERECO" consistently for reconstructed energies. I don’t think that there is an OGIP standard for this (which is more focused on X-rays where this corresponds basically to detector channels).

I agree. We should be consistent. That would require to change the energy naming also in the event lists.

I guess it would however also be good to change the EXTNAME and HDUCLASS2 (and CAL_CNAM) keywords to avoid any confusion. A suggestion would be: - EXTNAME: "EFFECTIVE AREA (RECO)" - HDUCLASS2: "EFF_AREA_RECO"

Yes, this should be required. If not, the FITS files would not be very comprehensible.

We also need to think about whether we actually want to change the name of the effective area columns (so far: "EFFAREA"). Since the meaning of the information does not change (both are effective area), I would in fact suggest to keep the column name.

I agree.

So to summarize, my proposal would be to have for true energy: - EXTNAME: "EFFECTIVE AREA" - HDUCLASS2: "EFF_AREA" - ENERG_LO - ENERG_HI - THETA_LO - THETA_HI - EFFAREA And for reconstructed energy: - EXTNAME: "EFFECTIVE AREA (RECO)" - HDUCLASS2: "EFF_AREA_RECO" - ERECO_LO - ERECO_HI - THETA_LO - THETA_HI - EFFAREA

Ok. For now, this could be a better option than current standard.

Although I must say I don't like adding both effective areas. I would prefer all IRFs to be as a function of true energy. But if there is general agreement of also adding EFFAREA vs ERECO then I guess its fine.

As soon as I find some time, I will try to modify the text and do a pull request (if no one did it already).

jknodlseder commented 8 years ago

If would propose to use "ERECO" consistently for reconstructed energies. I don’t think that there is an OGIP standard for this (which is more focused on X-rays where this corresponds basically to detector channels).

I agree. We should be consistent. That would require to change the energy naming also in the event lists.

I have not though about that one.

I think all event lists I know of have ENERGY, but are of course all ERECO. I would not be in favor of using a different convention here, and would prefer keeping ENERGY in the event lists.

I recognise that this is formally inconsistent, as in the energy list we have reconstructed energy while in the IRF we have (in general) true energy. Yet all missions do it that way, hence I would rather favor doing as the others do.

I guess it would however also be good to change the EXTNAME and HDUCLASS2 (and CAL_CNAM) keywords to avoid any confusion. A suggestion would be: - EXTNAME: "EFFECTIVE AREA (RECO)" - HDUCLASS2: "EFF_AREA_RECO"

Yes, this should be required. If not, the FITS files would not be very comprehensible.

We also need to think about whether we actually want to change the name of the effective area columns (so far: "EFFAREA"). Since the meaning of the information does not change (both are effective area), I would in fact suggest to keep the column name.

I agree.

So to summarize, my proposal would be to have for true energy: - EXTNAME: "EFFECTIVE AREA" - HDUCLASS2: "EFF_AREA" - ENERG_LO - ENERG_HI - THETA_LO - THETA_HI - EFFAREA And for reconstructed energy: - EXTNAME: "EFFECTIVE AREA (RECO)" - HDUCLASS2: "EFF_AREA_RECO" - ERECO_LO - ERECO_HI - THETA_LO - THETA_HI - EFFAREA

Ok. For now, this could be a better option than current standard.

Although I must say I don't like adding both effective areas. I would prefer all IRFs to be as a function of true energy. But if there is general agreement of also adding EFFAREA vs ERECO then I guess its fine.

As soon as I find some time, I will try to modify the text and do a pull request (if no one did it already).

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/open-gamma-ray-astro/gamma-astro-data-formats/issues/35#issuecomment-213319010

cdeil commented 8 years ago

My comments:

cdeil commented 8 years ago

I've assigned @TarekHC for this and put it under the 0.2 milestone (i.e. to be done soon, within the next month).

TarekHC commented 8 years ago

Thanks Christoph. This definitely will push me to finally do it. I will do a pull request "soon" (a week?).

Cheers!

TarekHC commented 8 years ago

Problem solved with #43 Closed.