open-gamma-ray-astro / gamma-astro-data-formats

Data formats for gamma-ray astronomy
https://gamma-astro-data-formats.readthedocs.io
Creative Commons Attribution 4.0 International
29 stars 27 forks source link

GTI required or optional with EVENTS? #20

Open cdeil opened 8 years ago

cdeil commented 8 years ago

Should GTI (Good time interval) extensions be required or optional? Are they required to be in the same file as the EVENTS extension?

@mimayer - What's the status in ctools? Are there any concrete plans to change something? http://gamma-astro-data-formats.readthedocs.org/en/latest/events/index.html#disclaimer-about-names-of-hdu-and-file-structure

Does someone have time to look at the EVENTS and GTI specs from OGIP? Do existing FTOOLs handling EVENTS work if no GTI is present (for operations where it's not needed)? This is probably something where we should just follow the existing practices, no?

mimayer commented 8 years ago

I had a look at the ctools code and in fact, GTI is not required at all. If no GTI extension is found, ctools uses TSTART and TSTOP keywords to build the GTI internally. Therefore, we could change the GTI extension to be optional (in case it is identical with TSTART and TSTOP).

cdeil commented 8 years ago

OK, ... I'll make the change in the spec later today.

cdeil commented 8 years ago

I removed the EVENTS / GTI disclaimer for ctools in 9535bd7

mimayer commented 8 years ago

While working on supporting GTIs in a separate file in ctools, I have discussed with Jürgen about this. https://cta-redmine.irap.omp.eu/issues/1598

Especially regarding having GTIs optional and in a separate file, he is very sceptical. Here is the input from Jürgen about certain points on the docs:

I looked again over http://gamma-astro-data-formats.readthedocs.org/en/latest/data_storage/index.html, here some comments (not sure where to post them otherwise, you may just echo them where appropriate):

mimayer commented 8 years ago

Just to clarify: The idea is to make a GTI extension required in the same file as the events and to put the GTI extension name as a DSS keyword.

cdeil commented 8 years ago

Is having multiple EVENTS and GTI in one FITS file a use case we want to support?

Yes, pointing from EVENTS to GTI is an option. In OGIP they do that a lot, they also point to the IRFs from the EVENTS header. But that's a major change to what we've been discussing so far, which has no interlinks between HDUs. Thoughts?

Is there a description / webpage / document / spec / whatever of "DSS"? The Fermi ScienceTools are based on this, so clearly it's an option for IACTs worth considering as well. But again, it's a fundamental change / addition to use DSS and cut selection tracking via headers. Do we want to spec / prototype this?

mimayer commented 8 years ago

Is having multiple EVENTS and GTI in one FITS file a use case we want to support?

Yes this is still supported. Of course these EVENTS and GTIs have to have different extension names.

But that's a major change to what we've been discussing so far, which has no interlinks between HDUs. Thoughts?

I agree but I would argue that GTIs are special and very specific to the event list (unlike IRFs, which can be similar for different observations). GTIs are more like cut parameters and always have to be linked to the corresponding event list.

Is there a description / webpage / document / spec / whatever of "DSS"?

https://confluence.slac.stanford.edu/display/ST/Data+SubSpace+keywords http://hea-www.harvard.edu/~jcm/asc/docs/ps/SDS07.ps (section 1.13) or as pdf from Redmine https://cta-redmine.irap.omp.eu/attachments/download/1822/SDS07.pdf https://cta-redmine.irap.omp.eu/attachments/download/1823/fits_standard30aa.pdf

The Fermi ScienceTools are based on this, so clearly it's an option for IACTs worth considering as well. But again, it's a fundamental change / addition to use DSS and cut selection tracking via headers. Do we want to spec / prototype this?

Ctools already uses DSS selection quite extensively, e.g. when applying ctselect. I can add this to the spec.

cdeil commented 8 years ago

Related: https://cta-redmine.irap.omp.eu/boards/6/topics/213

cdeil commented 8 years ago

At the moment for HESS we have one run = one event list = one start / stop time = GTI superfluous. So I'd suggest to make it optional for now. Once the use cases for GTIs are identified, we can extend the spec and describe those. E.g. Nathan is splitting observations into chunks. How is this encoded in GTIs / HDUs?

Ctools already uses DSS selection quite extensively, e.g. when applying ctselect. I can add this to the spec.

That's a major addition. Probably a good one. So +1 to mention it in http://gamma-astro-data-formats.readthedocs.org/en/latest/info/index.html and http://gamma-astro-data-formats.readthedocs.org/en/latest/info/glossary.html If you introduce now pages and required / optional info, please do it via a pull request so that we can discuss it before it goes in.

mimayer commented 8 years ago

At the moment for HESS we have one run = one event list = one start / stop time = GTI superfluous. So I'd suggest to make it optional for now.

Well ctools can handle missing GTIs. But as far as I understand, this is not the way to go. Eventually all observations need GTIs. Maybe for CTA we will have only one observation per night with multiple GTIs - no one really knows. GTIs are generic enough to cover all the use cases we can currently think of.

E.g. Nathan is splitting observations into chunks. How is this encoded in GTIs / HDUs?

Not sure how Nathan does this exactly but I guess he has a python script that does all the work in splitting and assigning IRFs using certain criteria.

If you introduce now pages and required / optional info, please do it via a pull request so that we can discuss it before it goes in.

Ok

kosack commented 8 years ago

Just a comment: it is looking very likely that with CTA we will drop the concept of "runs", meaning that in a particular observation there may be events that are not appropriate for the attached IRF (e.g. events reconstructed during slewing that would need to be treated differently). For that reason, GTIs, even if trivially describing the run start and stop time, are very important. The first usable event may actually come thousands of events into the file, for example.

Therefore, even if it seems trivial, the software should be implemented to use them and not rely on the run start and stop headers (those can even be dropped, or considered only "human readable", while the tools should access the GTI to get the time bounds in a machine-readable format).

That also means that at least in a future release, there should be an extra loop somewhere over GTIs within a file, rather than assuming the events are contiguous. That even allows us to immediately do things like light-curve binning, PSR phases, and "cloud" removal, which are all nice features to have!

mimayer commented 8 years ago

Thanks for the input @kosack. @cdeil, given the feedback we got, what do you think of making GTIs required to be compliant with future versions?

cdeil commented 8 years ago

@kosack - Thanks for your comment! I wasn't aware that it might be common to have science tools apply GTIs.

@cdeil, given the feedback we got, what do you think of making GTIs required to be compliant with future versions?

I'm 50:50 on this.

Keeping the option to just have EVENTS with single start / stop time in the header and no GTI could still be allowed / the fallback if no GTI is present.

But I'm also OK with making GTI required and to make it very clear that science tools should read GTI, not EVENTS header keywords for this.

I still have several questions:

mimayer commented 8 years ago

How do livetime computations work if there's multiple GTIs? One needs to read the deadtime fraction from the events header and assume it's constant, right?

I guess for now lifetime is computed using GTIs (regardless where they come from). Eventually there will be more detailed information about the deadtime but currently it is assumed to be constant.

What is the recommended way to bundle many EVENTS / GTIS in one FITS file? (I think a FITS file can have many HDUs with the same name, so calling them all EVENTS and GTI is an option I think). So how does the HDU index table and / or the EVENTS link to the GTI then?

As Jürgen suggested, we could use the DSS keywords (like it is done in Fermi):

DSTYP1  = 'TIME    '
DSUNI1  = 's       '
DSVAL1  = 'TABLE   '
DSREF1  = ':GTI    '

The DSREF keyword gives the extension name of the GTI (required to be in the same file). I would suggest to remove GTIs from the HDU index and require them to be in the same file as the events (and link the extension from the EVENTS header).

Is our description for observation still OK or does it need to be modified? Do we stick with one set of IRFs per observation?

For now I would keep it that way. If the user wants to split up runs into chunks and use different IRFs it is possible but more effort which is fine for now.

@mimayer or anyone else - Are you willing to make a PR with a new proposal for GTIs / and a better description how they work? (for me this is not on the top of the list of things to spec / prototype, because we don't need it for HESS)

I don't have much time this week but I will try.

cdeil commented 8 years ago

I guess for now lifetime is computed using GTIs

No, at the moment I'm using LIVETIME FITS header keyword from the EVENTS extension in gammapy.data.EventList. Of course we can adapt to whatever solution we agree on in the spec.

This is a bit related to the idea to have extra TECH3 files ... info like deadtime fraction and good time intervals could be there. Is it a good idea to keep deadtime fraction and livetime header keywords in the EVENTS HDU?

we could use the DSS keywords

Can you please split that out into a separate issue (or even better yet: pull request)? I have to read over the existing DSS documents and think about how well it works for IACTs.

I would suggest to remove GTIs from the HDU index and require them to be in the same file as the events (and link the extension from the EVENTS header).

Why treat the GTI differently from the IRFs?

Having all HDUs in the HDU index table is convenient for data distribution / synchronisation. What do we gain by removing GTI from the HDU index?

If you want the links from EVENTS HDU to the GTI, I'd suggest to also add links from EVENTS to the IRFs, and to keep the HDU index as-is. Although: flexibility in which files HDUs are located / complexity is introduced when introducing a linking scheme.

mimayer commented 8 years ago

Is it a good idea to keep deadtime fraction and livetime header keywords in the EVENTS HDU?

As long as there is no "spacecraft" or whatever file these keywords should stay in the FITS header.

Can you please split that out into a separate issue (or even better yet: pull request)? I have to read over the existing DSS documents and think about how well it works for IACTs.

Yes

Why treat the GTI differently from the IRFs?

Because they are specific to the event lists (IRFs such as background can apply to different observations). From the event file it should be clear which GTIs correspond to the event lists. Event list shouldn't exist without a GTI definition.

Having all HDUs in the HDU index table is convenient for data distribution / synchronisation. What do we gain by removing GTI from the HDU index?

Coherent handling of GTIs and linking them to the event list. We can of course list the GTIs in the HDU index but I would strongly prefer to also link the GTIs from the EVENT header.

If you want the links from EVENTS HDU to the GTI, I'd suggest to also add links from EVENTS to the IRFs, and to keep the HDU index as-is. Although: flexibility in which files HDUs are located / complexity is introduced when introducing a linking scheme.

No, GTIs are different. Following the Fermi-LAT scheme GTIs are unique to the event list. IRFs, as said above) do not have to be linked. The user can e.g. choose to use different background models etc.

cdeil commented 8 years ago

We can of course list the GTIs in the HDU index but I would strongly prefer to also link the GTIs from the EVENT header.

Let's do that.

GTIs are different. Following the Fermi-LAT scheme GTIs are unique to the event list. IRFs, as said above) do not have to be linked. The user can e.g. choose to use different background models etc.

That's not clear yet. There could be different GTIs (depending on data quality or other criteria) for a given EVENTS HDU as well.

I agree that one can make arguments that GTI and BACKGROUND is special and should be treated differently than other IRFs. But it's not clear yet / there's no agreement yet how it should work / we're not currently doing this in the HDU index spec.

So bottom line ... pull requests welcome. And I think we should also organise a f2f meeting in the coming weeks / months to meet for a week and just discuss / write up all these details instead of only working via Github.

mimayer commented 8 years ago

There could be different GTIs (depending on data quality or other criteria) for a given EVENTS HDU as well.

No, if the user just updates GTIs according to some quality parameters.

I agree that one can make arguments that GTI and BACKGROUND is special and should be treated differently than other IRFs. But it's not clear yet / there's no agreement yet how it should work / we're not currently doing this in the HDU index spec.

It doesn't matter to me, we can list the GTIs in the HDU spec for completeness. But I think it is important that GTIs are linked from the events header too.

So bottom line ... pull requests welcome. And I think we should also organise a f2f meeting in the coming weeks / months to meet for a week and just discuss / write up all these details instead of only working via Github.

Sounds reasonable.

kosack commented 8 years ago

Here's a non-official diagram I made recently to show how GTIs might relate to observation blocks(which are just generalized runs that may have more events than the science tools would normally use in them). It's just a brainstorm right now, but illustrates some complex use cases. One might want for example to analyze data during slewing to make a slew survey, or select data when one set of telescopes in an array have stabilized and are tracking, while the others are still moving. Those special cases require different IRFs than the standard ones, so would need different GTI definitions.

Those would be realized via a tool e.g. ctmktime selection=standard, or "slew" or any of 1000 other possiblities, each with possibly different IRFs.

observing_blocks_no_runs

cdeil commented 8 years ago

@kosack - Thanks!

At the DL3 file / science tools level, analysing data taken while the telescopes slew (or the array changes in any other way) could be handled by defining (probably small) periods of time during which the IRFs are sufficiently constant. This could be done by having small "observations" each with one start / stop and be handled by the existing format / science tools.

Alternatively large extensions are needed so that one observation has multiple GTIs, each with their per-GTI IRFs, or even introducing TECH3 files that describe how the array changes, and have a larger IRF lookup database to compute IRFs for a given point in time on the fly. This is possible, and might even be needed for CTA, but defining this in the format specs and implementing it in the exporters / science tools is a huge project, and just not worth the effort for people that want to get results with HESS / VERITAS soon. Also, my understanding is that no-one in CTA is working on a spec / tools to handle this use case for DL3 at the moment.

kosack commented 8 years ago

@cdeil Yes that is the idea - you wouldn't have to have a single GTI file that has the "Standard" ,"slew", etc gtis. You just have the GTI that the user selected. So they would do:

etc.. So the science tools don't need to understand the difference between GTI types, that's up to the user to associate. They need a make-gti tool that creates a GTI that is valid for a given IRF + allows the user to make sub-selections within that GTI.

kosack commented 8 years ago

I guess that does lead to one possible advanced case: the user may want to generate several GTI extensions and not always overwrite one. I don't know if that is a use case in other telescopes, but it could be useful to have the name of the GTI extension flexible ("GTI" by default, but the user could select another if they want "MYGTI", etc). I guess that's the same as with the IRFs, where you don't want to hard-code the extension name.

cdeil commented 8 years ago

As part of a HESS presentation today I mentioned this DL3 effort. The main feedback by Werner Hofmann was:

@mimayer @kosack @jknodlseder @cboisson @joleroi @GernotMaier - Thoughts?

Should we start and change things now? Or make this a focus of the April IACT DL3 meeting and only there write down that new, more flexible data model and format spec?

TarekHC commented 8 years ago

Hi Christoph,

On Mon, Feb 22, 2016 at 2:56 PM, Christoph Deil notifications@github.com wrote:

As part of a HESS presentation today I mentioned this DL3 effort. The main feedback by Werner Hofmann was:

  • IRFs should be per-GTI for CTA. I guess with VERITAS already chunking runs into time slices with separate IRFs, we should adapt in HESS and in our specs and change the obs and HDU index files to allow multiple GTIs and IRFs per obs.

I believe the current thought is that DL3 "chunks" of data will be limited to periods of time in which the IRF3 are able to properly describe, within allowed systematics, EVT3 data. Is that correct?

Larger periods of time would just need the merging of the IRF3 phase space, so I don't see problems with that.

  • Event types (e.g. HESS 1 / 2) should be a priority. This is already possible in the various science tools by treating each event type as a separate observation. But to make this simple we need to add the bookkeeping to handle multiple event types (each with separate IRFs) for a given obs.

@mimayer https://github.com/mimayer @kosack https://github.com/kosack @jknodlseder https://github.com/jknodlseder @cboisson https://github.com/cboisson @joleroi https://github.com/joleroi @GernotMaier https://github.com/GernotMaier - Thoughts?

Should we start and change things now? Or make this a focus of the April IACT DL3 meeting https://github.com/open-gamma-ray-astro/2016-04_IACT_DL3_Meeting and only there write down that new, more flexible data model and format spec?

— Reply to this email directly or view it on GitHub https://github.com/open-gamma-ray-astro/gamma-astro-data-formats/issues/20#issuecomment-187185651 .

kosack commented 8 years ago

For simplicity, you can simply make the assumption that there may be multiple GTIs in a run, but they all share the same IRF (e.g. some data may be cut out, but there is by defauly still only one GTI/run). The advanced use of using another GTI subset with different IRF doesn't need to be handled by the software automatically, since the user can do it.

Therefore you have:

cdeil commented 6 years ago

I'm about to make the v0.2 version of the spec, so I'm moving this to v1.0.

I think CTA will now for the next year or two drive the DL3 spec and eventually define something. This discussion might be a useful record for previous discussions, but we're not going to resolve this question of time intervals on Github.