v-morello / riptide

Pulsar searching with the Fast Folding Algorithm (FFA)
https://riptide-ffa.readthedocs.io/
MIT License
30 stars 11 forks source link

Error parsing .inf files from X-ray data #1

Closed paulray closed 3 years ago

paulray commented 3 years ago

I tried running rseek on some X-ray data I had and it has trouble parsing some of the entries. I was able to make it work by deleting lines from the .inf file, but it would be nice if it would handle X-ray data without a complaint.

Here is the .inf file:

 Data file name without suffix          =  sgr_bary
 Telescope used                         =  NICER
 Instrument used                        =  XTI
 Object being observed                  =  SGR_1830-0645
 J2000 Right Ascension (hh:mm:ss.ssss)  =  00:00:00
 J2000 Declination     (dd:mm:ss.ssss)  =  00:00:00
 Data observed by                       =  NICER
 Epoch of observation (MJD)             =  59132.774101755
 Barycentered?           (1=yes, 0=no)  =  1
 Number of bins in the time series      =  128000
 Width of each time series bin (sec)    =  0.0078125
 Any breaks in the data? (1=yes, 0=no)  =  0
 Type of observation (EM band)          =  X-ray
 Field-of-view diameter (arcsec)        =  180.00
 Central energy (kev)                   =  4.2
 Energy bandpass (kev)                  =  3.8
 Data analyzed by                       =  paulr
 Any additional notes:
    None

It gets errors like this (and if you fix this it barfs on the next line...):

  File "/Users/paulr/env/pint/bin/rseek", line 8, in <module>
    sys.exit(main())
  File "/Users/paulr/env/pint/lib/python3.8/site-packages/riptide/apps/rseek.py", line 172, in main
    run_program(args)
  File "/Users/paulr/env/pint/lib/python3.8/site-packages/riptide/apps/rseek.py", line 112, in run_program
    ts = loader(args.fname)
  File "/Users/paulr/env/pint/lib/python3.8/site-packages/riptide/timing.py", line 11, in wrapped
    output = func(*args, **kwargs)
  File "/Users/paulr/env/pint/lib/python3.8/site-packages/riptide/time_series.py", line 290, in from_presto_inf
    inf = PrestoInf(fname)
  File "/Users/paulr/env/pint/lib/python3.8/site-packages/riptide/reading/presto.py", line 87, in __init__
    items = inf2dict(fobj.read())
  File "/Users/paulr/env/pint/lib/python3.8/site-packages/riptide/reading/presto.py", line 79, in inf2dict
    return parse_pairs(pairs, extra_lines)
  File "/Users/paulr/env/pint/lib/python3.8/site-packages/riptide/reading/presto.py", line 72, in parse_pairs
    items[keyname] = keytype(value)
ValueError: could not convert string to float: 'X-ray'
v-morello commented 3 years ago

Hi @paulray,

The PRESTO inf parser is indeed rather crude and has only been tested on radio observations. That shouldn't be too difficult to fix, and I could add the above file as a test case.

However, a much more important concern here is that riptide is currently designed under the assumption that the background noise in the data follows a Gaussian distribution. This is fine for radio data, but most likely not for X-ray data (low photon counts and Poissonian noise statistics) unless the background count rate per time bin is "high". It is possible that making riptide work with X-ray data might require a lot more careful thought and effort than just fixing this parsing problem. It all boils down to the question: what is the best test statistic for the presence of a pulse in a noisy folded profile ? With Gaussian noise statistics, matched filters are optimal, and that's what is currently implemented. With Poissonian noise statistics, this might not be the case at all. I'd have to read up on this and give it some thought before making any further statement.

I guess giving it a go anyway is a good starting point, but I suspect that the output periodogram could be plain nonsense in some cases. I have no experience in searching X-ray data for periodicity, so if you have any comments on the above, please go ahead.

paulray commented 3 years ago

Yeah, I agree. I was just playing with it. For what it is worth, the SGR period popped right out!

rseek -f presto --bmin 16 --bmax 64 --Pmax 20.0 --rmed_width 32 sgr_bary.inf
2020-10-23 09:08:40,199          timing.py:14   DEBUG    'from_presto_inf' runtime: 17.80 ms
2020-10-23 09:08:40,199           rseek.py:114  DEBUG    Searching period range [1.0, 20.0] seconds with 16 to 64 phase bins
2020-10-23 09:08:40,205          timing.py:14   DEBUG    'deredden' runtime: 6.49 ms
2020-10-23 09:08:40,217          timing.py:14   DEBUG    'ffa_search' runtime: 18.32 ms
2020-10-23 09:08:40,222          timing.py:14   DEBUG    'find_peaks' runtime: 4.68 ms
        period          freq  width    ducy     dm    snr
  10.412829634   0.096035375      4   9.76%   1.00   21.6

But indeed, it should use a Poisson likelihood for the test statistic so it would have to be changed to be optimal for X-ray data. I was just curious. @kerrm has thought a lot about optimal test statistics for Poisson data, so he might have some good ideas. Also, the dereddening needs are different for X-ray. In many cases the power spectra are perfectly white, but in others there is strong red (or band-limited) noise.

v-morello commented 3 years ago

Well, glad to see that ! We could definitely look into extending riptide for X-ray data, especially if you have some good ideas already floating around. In the meantime, I'll fix the parsing, hopefully in the next couple of days.

astrogewgaw commented 3 years ago

@paulray, @v-morello Just had a doubt regarding the parsing of inf files: are the radio and X-ray cases the only two types of inf files that have to be dealt with? From what I can see in the file that @paulray has provided, the only part that's different is the one below theType of observation (EM band) field, so will it be okay to assume that the variables below that are the only ones changing for radio and X-ray data? also, are those fields (for X-ray data) fixed? or could some of them be absent from the inf file entirely?

paulray commented 3 years ago

Probably @scottransom can explain best what the official .inf file format is.

scottransom commented 3 years ago

Here is an example using x-ray data. Feel free to use it as a template.

 Data file name without suffix          =  CasA_HRC_all
 Telescope used                         =  Chandra
 Instrument used                        =  HRC-S
 Object being observed                  =  CasA CCO
 J2000 Right Ascension (hh:mm:ss.ssss)  =  23:23:27.919
 J2000 Declination     (dd:mm:ss.ssss)  =  +58:48:42.55
 Data observed by                       =  Deepto Chakrabarty
 Epoch of observation (MJD)             =  54910.912887586637225
 Barycentered?           (1=yes, 0=no)  =  1
 Number of bins in the time series      =  1073741824
 Width of each time series bin (sec)    =  0.001
 Any breaks in the data? (1=yes, 0=no)  =  0
 Type of observation (EM band)          =  X-ray
 Field-of-view diameter (arcsec)        =  3.000000
 Central energy (kev)                   =  1.000000
 Energy bandpass (kev)                  =  5.000000
 Data analyzed by                       =  Scott Ransom
 Any additional notes:
    Full ms-resolution analysis
v-morello commented 3 years ago

@astrogewgaw Indeed the only part that changes is the one below Type of observation (EM band), to the exception of data generated with PRESTO's makedata. The "official" way to parse can be found here: https://github.com/scottransom/presto/blob/master/src/ioinf.c

My new implementation pretty much follows that, I'll push it later today.

v-morello commented 3 years ago

I have pushed a fix and tagged a new version v0.2.1, see changelog. X-ray and Gamma PRESTO inf files are now supported out of the box. Creating a TimeSeries object from a high-energy data file prints a warning about possible non-Gaussian noise statistics. Example:

$ rseek -f presto test_xray.inf 
/home/vince/repositories/riptide/riptide/time_series.py:315: UserWarning:  You have loaded file 
'test_xray.inf', which contains data observed at a high-energy band 'X-ray'. riptide is NOT designed 
to process low photon count time series, i.e. where the background noise statistics are 
non-Gaussian. Be VERY careful when interpreting any search outputs.
  warnings.warn(msg, category=UserWarning)

@paulray Can you please have another go on your data to be extra sure ? If all goes fine I will close this.

paulray commented 3 years ago

Thanks, I tried it and indeed it runs fine on my X-ray data now. Thanks! I like the warning about Poisson statistics as well. It isn't clear to me in what cases FFA might be useful on high energy data, but it is nice to easily be able to try it out.

kerrm commented 3 years ago

Hi All,

Just catching up and chiming in. I had a look through the code and I'd have to spend some more time looking at the C++ implementations to understand how the FFA actually works to offer useful suggestions. So just considering use cases for Fermi data (very sparse) and NICER data (somewhat less sparse), here are some simple thoughts:

-- both can be treated as weighted time series, especially for the purposes of time-domain operations (filtering, dereddening, etc.) -- Fermi weights are useful (critical) for boosting S/N -- NICER weights aren't so useful, but starting off with weights=1 and then using weights!=0 to implement time-domain operations would be useful -- just treating things as a weighted time series, you'd lose Poisson likelihood, but you could use statistics like Z^2 and H to compute periodograms and S/Ns. Z^2 in particular follows the same distributions as you get from gaussian (power) periodograms -- it does seem somehow wasteful to do a fast fold and then have to go back and calculate all of the trigonometric moments, it becomes much more like a traditional FFT-like search, so I don't know how that plays out; maybe a two-stage process of finding the candidates assuming everything is a simple time series then refining using Z^2/H or a template-based log likelihood for a refined significance

Cheers, Matthew

On Fri, 23 Oct 2020 at 11:01, Paul Ray notifications@github.com wrote:

Yeah, I agree. I was just playing with it. For what it is worth, the SGR period popped right out!

rseek -f presto --bmin 16 --bmax 64 --Pmax 20.0 --rmed_width 32 sgr_bary.inf 2020-10-23 09:08:40,199 timing.py:14 DEBUG 'from_presto_inf' runtime: 17.80 ms 2020-10-23 09:08:40,199 rseek.py:114 DEBUG Searching period range [1.0, 20.0] seconds with 16 to 64 phase bins 2020-10-23 09:08:40,205 timing.py:14 DEBUG 'deredden' runtime: 6.49 ms 2020-10-23 09:08:40,217 timing.py:14 DEBUG 'ffa_search' runtime: 18.32 ms 2020-10-23 09:08:40,222 timing.py:14 DEBUG 'find_peaks' runtime: 4.68 ms period freq width ducy dm snr 10.412829634 0.096035375 4 9.76% 1.00 21.6

But indeed, it should use a Poisson likelihood for the test statistic so it would have to be changed to be optimal for X-ray data. I was just curious. @kerrm https://github.com/kerrm has thought a lot about optimal test statistics for Poisson data, so he might have some good ideas. Also, the dereddening needs are different for X-ray. In many cases the power spectra are perfectly white, but in others there is strong red (or band-limited) noise.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/v-morello/riptide/issues/1#issuecomment-715396657, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF3LAG5DAQIGUTMXXG7A7A3SMGLGBANCNFSM4S4RMK4A .