The Neutron star Interior Composition Explorer (NICER) is an International Space Station (ISS) payload devoted to the study of high energy X-ray sources in the Universe. Primarily designed to study Neutron Stars, but it can also study black holes in stellar systems and also in active galactic nuclei (AGN).
Unlike detectors that have CCD detectors that allow one to simultaenously separate photons from the source from those of the background by selecting the relevant regoins on the CCD, NICER does not produce images, so estimating the background is a non-trivial task.
To help with background estimates, NICER observed regions of the sky that are known to contain no X-ray sources, called BKGD_RXTE[1..8]
. This model used those observations to train a machine learning model to estimate the background in observations of targets of interest.
The basic idea is to use MKF
parameters that come with every observations, that contain information about the telescope and the envirenment during the observations, to estimate the background.
This repository contains the trained model and spectral data that can be used to estimate the background of specific observations. The model has been trained using xgboost
, and the final product is provided here so it can be used.
There are 2 components in the model data:
nicerBgML.py
: a python script that reads the model files (model.npz
) and produces a background spectrum for a given obsID.model.npz
) and the basis spectra spec.*.pha
. The former is the numpy file that contains the trained model and related variables. The spec.*.pha
are the basis spectra that are used to construct the background spectrum after the modeling. This file can be downloaded here.fcurve
and mathpha
from ftools
and nicerl2
, so a functioning installation of heasoft
is needed. The modeling was does using heasoft-6.29c
, so it is recommended that the script is used with that version. Other versions of heasoft may fail to extract all the MKF parameters needed for the modeling.numpy
astropy
pandas
sklearn
py-xgboost==1.3.3
The python libraries can be installed with conda
(e.g. pip install numpy astropy pandas sklearn py-xgboost
).
Note that py-xgboost is available only through conda not through pip. See the section "Known Issues" below.
Version 0.4.t1.200e requires a geomagnetic data files that can be downloaded from https://heasarc.gsfc.nasa.gov/FTP/caldb/data/gen/pcf/geomag (see discussion here).. Please download the 6 files: dst_kyoto.fits
, f107_petincton.fits
, geomag.tar.gz
, kp_noaa.fits
, kp_potsdam.fits
, solarphi_oulu.fits
into some directory and then provide the location of that directory as input to the nicerBgML.py
script. See USAGE section below.
See the Known Issues section below for a modification to the fcurve
code that may be needed for the script to run correctly.
nicerBgML.py
and nicerBgML.tgz
files.nicerBgML.tgz
Running nicerBgML.py -h
print some useful information on the usages:
usage: nicerBgML.py [-h] [--dataDir dataDir] [--modelFile modelFile] [-v] obsID kpDir
Estimate NICER background using Machine Learning. This is a basic version that uses 50 MPUs (standard minus 14 and 34). Version 0.4.t1.200e uses tBin=1
seconds and 50 spectral bins (nGrp). The main difference compared to other version is that we model the spectra in two bands: 0.2-0.4 and 0.4-15, so the model
has the largest energy coverage of previously-released models. Similar to the 0.3 version model, we use more MKF parameters including the space weather model.
- tBin is the time bin size used for constructing the model, and it is the time bin size that will be used when binning the MKF data. - nGrp is the number of
basis spectra used in the modeling The kpDir parameter should point to the directory containing the geomagnetic data: dst_kyoto.fits, f107_petincton.fits,
geomag.tar.gz, kp_noaa.fits, kp_potsdam.fits, solarphi_oulu.fits availabel in: https://heasarc.gsfc.nasa.gov/FTP/caldb/data/gen/pcf/geomag See
https://heasarc.gsfc.nasa.gov/docs/nicer/analysis_threads/geomag/ for details.
positional arguments:
obsID The obsID for which the background spectrum is to be estimated
kpDir Location of the geomagnetic data. Download from: https://heasarc.gsfc.nasa.gov/FTP/caldb/data/gen/pcf/geomag/; There are 6 files:
dst_kyoto.fits f107_petincton.fits geomag.tar.gz kp_noaa.fits kp_potsdam.fits solarphi_oulu.fits
optional arguments:
-h, --help show this help message and exit
--dataDir dataDir The path to the directory containing the model data, including the basis spectra (default: nicerBgML)
--modelFile modelFile
The name of the model npz file. Search in current folder and in dataDir (default: model.npz)
-v, --version show program's version number and exit
nicerBgML
, then from some obsID
(e.g. 4693011001
), we use:
> python nicerBgML.py --dataDir ./nicerBgML 4693011001 /location/of/geomagnetic/data/
If everything runs correctly, the background spectrum spec.b.pha
will be created inside 4693011001/spec
0.4.t1.200e (Latest): This is an updated version, published on December 6, 2021. The input parameters to the model are similar to the previous verion, but it is optimized for the whole 0.2-15 keV band. It works by modeling the data in 2 bands: 0.2-0.4 and 0.4-15 keV. It is a classification model that samples the MKF parameters every 1 second and classifies the background data into 50 basis spectra. The root-mean-squared performance in the background estimates is 1.5
counts/s vs 73
in the 3C50 model. If the 1% outlier backgorund observations are discarded, the performance is 0.7
counts/s (vs 9.6
for the 3C50 model), all measured over the whole 0.2-15 keV band.
0.3.t1.35: This version uses the latest heasoft updates (as of August 2021). The model is optimizd to work in the energy range 0.5-10 keV (unlike the previous version). It uses 40 MKF parameters, including those from the geomagnetic data. It is a classification model that samples the MKF parameters every 1 second and classifies the background data into 35 basis spectra. The root-mean-squared performance in the background estimates is 0.21
counts/s vs 4.04
in the 3C50 model. If the 1% outlier backgorund observations are discarded, the performance is 0.15
counts/s (vs 1.25
for the 3C50 model).
0.2.t4n20: This is an enhanced version of 0.1.t4n20, released on July 11, 2021. It is based on more MKKF parameters, 27 in total, including the KP index from the geomagnetic data. It is also a classification model that samples the MKF parameters every 4 seconds that classifies the background data into 20 basis spectra. The root-mean-squared performance in the background estimates is 1.7
counts/s vs 3.2
in the 3C50 model. If the 1% outlier backgorund observations are discarded, the performance is 0.87
counts/s (vs 2.3
for the 3C50 model).
0.1.t4n20: This is the first model presented at the NICER Observatory Science Working Group (OSWG) on June 30, 2021. It is based on a classification model that uses 15 parameters from the MKF file sampled every 4 seconds to classify the background data into 20 basis spectra. The root-mean-squared performance in the background estimates is 2.0
counts/s vs 3.2
in the 3C50 model. If the 1% outlier backgorund observations are discarded, the performance is 1.3
counts/s (vs 2.3
for the 3C50 model).
This is a basic version that uses 50 MPUs (standard 52 minus 14 and 34) using standard filtering criteria (detlist=launch,-14,-34 min_fpm=50
in nicerl2
). If you use a different number of MPUs, you will have to scale the background spectrum accordingly.
If you use a filtering criterian that is different from the standard one in nicerl2
, this model may not be applicable. I am working on ways to include non-standard selection criteria in the future.
If the script fails when running fcurve
, then it is likely because a possible bug in fcurve
. It cannot handle a long list of columns. The number of columns used here (15) cannot fit into the fortran character array used. A simple fix, until it is permanently fixed, is to change that manually and re-compile it.
heasoft-6.28/ftools/futils/tasks/fcurve/fcurve.f
.columns
amd outcols
to use longer character length (the lines following subroutine fcurve
). Change character(80)
to character(N)
for some large number N
(e.g. 2000).heasoft-6.29/ftools/futils
, recompile the code by running: hmake; hmake install
Installing xgboost may be problematic on some systems. If you are running python through anaconda (recommended), then you can just install py-xgboost (conda install py-xgboost), that should be all that is needed. Installing xgboost from pip may give you an error related to libomp. In that case there a few options:
Please report other issues running the model script here