v-morello / clfd

Smart RFI removal algorithms to be used on folded pulsar search and timing data
MIT License
14 stars 5 forks source link

Axis value NaN/inf when h5 file has data making corner plot #2

Open msthomps opened 9 months ago

msthomps commented 9 months ago

Hello,

I am working with CHIME timing data and running clfd. I want to produce the two diagnostic plots shown in your readme e.g. corner plot and profile mask. However, I run into the error in Code Block 1 below. It doesn't occur for all my files but when it does, from what I can tell, clfd does successfully run on the file, and looking at the .h5 report the */block#_values seem to have a span of data (see code block 2). I'm not sure where it's looking to find an inf/nan value producing my error and not allowing the plots to be generated. Apologies, if this is a misunderstanding on my end, but I've exhausted options and can't seem to find what's causing the error. Any clarification would be appreciated.

 # Code block 1
 Traceback (most recent call last):
   File "/project/6004902/msthomps/my_timing/test/clfd_reports/clfd_scripts/clfd_beam_4_plts.py", line 7, in <module>
     cr = [Report.load(f).corner_plot().savefig(f[:-10]+'_crnr.png') for f in beam_4_files]
   File "/project/6004902/msthomps/my_timing/test/clfd_reports/clfd_scripts/clfd_beam_4_plts.py", line 7, in <listcomp>
     cr = [Report.load(f).corner_plot().savefig(f[:-10]+'_crnr.png') for f in beam_4_files]
   File "/project/6004902/chimepsr-software/v1/pkgs/clfd/2021-04-27/install/lib/python3.10/site-packages/clfd/report.py", line 182, in corner_plot
     fig = CornerPlot(self).plot(**kwargs)
   File "/project/6004902/chimepsr-software/v1/pkgs/clfd/2021-04-27/install/lib/python3.10/site-packages/clfd/report_plots.py", line 119, in plot
     self._histogram(xname)
   File "/project/6004902/chimepsr-software/v1/pkgs/clfd/2021-04-27/install/lib/python3.10/site-packages/clfd/report_plots.py", line 94, in _histogram
     ax.set_xlim(xmin, xmax)
   File "/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/scipy-stack/2023b/lib/python3.10/site-packages/matplotlib/_api/deprecation.py", line 454, in wrapper
     return func(*args, **kwargs)
   File "/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/scipy-stack/2023b/lib/python3.10/site-packages/matplotlib/axes/_base.py", line 3650, in set_xlim
     return self.xaxis._set_lim(left, right, emit=emit, auto=auto)
   File "/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/scipy-stack/2023b/lib/python3.10/site-packages/matplotlib/axis.py", line 1184, in _set_lim
     v0 = self.axes._validate_converted_limits(v0, self.convert_units)
   File "/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/scipy-stack/2023b/lib/python3.10/site-packages/matplotlib/axes/_base.py", line 3570, in _validate_converted_limits
     raise ValueError("Axis limits cannot be NaN or Inf")
ValueError: Axis limits cannot be NaN or Inf

The contents of the *.h5 report saved from the clfd output

#  Code Block 2
(astro-work) [msthomps@cedar5 clfd_reports]$ h5ls -r *_clfd_report.h5
/                        Group
/features                Group
/features/axis0          Dataset {3}
/features/axis1_label0   Dataset {93184}
/features/axis1_label1   Dataset {93184}
/features/axis1_level0   Dataset {91}
/features/axis1_level1   Dataset {1024}
/features/block0_items   Dataset {1}
/features/block0_values  Dataset {93184, 1}
/features/block1_items   Dataset {2}
/features/block1_values  Dataset {93184, 2}
/frequencies             Group
/frequencies/axis0       Dataset {1}
/frequencies/axis1       Dataset {1024}
/frequencies/block0_items Dataset {1}
/frequencies/block0_values Dataset {1024, 1}
/header                  Group
/profmask                Group
/profmask/axis0          Dataset {1024}
/profmask/axis1          Dataset {91}
/profmask/block0_items   Dataset {1024}
/profmask/block0_values  Dataset {91, 1024}
/stats                   Group
/stats/axis0             Dataset {3}
/stats/axis1             Dataset {6}
/stats/block0_items      Dataset {3}
/stats/block0_values     Dataset {6, 3}
/zap_channels            Group
/zap_channels/axis0      Dataset {1}
/zap_channels/axis1      Dataset {1024}
/zap_channels/block0_items Dataset {1}
/zap_channels/block0_values Dataset {1024, 1}
v-morello commented 9 months ago

Hi Mercedes,

All the plot limits are calculated in this function, lines 49-51: https://github.com/v-morello/clfd/blob/cf3dd7f058eca038672cea62fc77a397d05feeff/clfd/report_plots.py#L49

med[name] and iqr[name] refer to the median and inter-quartile range of one of the data features (e.g. peak-to-peak, stddev, etc.), and some of these must be Inf or Nan. In turn, it means the input data in your folded cube contain some Infs or Nans, and I would be almost positive it's NaNs, because the numpy functions used to compute the median and the inter-quartile range of an array of values are fairly robust to Inf values. Here's an illustration:

In [3]: percentile([1,2,3,4,5, nan], 50)
Out[3]: nan

In [7]: percentile([-inf, 1,2,3,4,5, +inf], 50)
Out[7]: 3.0

I think the most practical option at this stage is that you find out whether you have NaNs in your input data. One way (up to you) would be to install clfd in editable mode in its own python environment (how to do this described in the README), and strategically insert some debug print statements to try an confirm whether indeed you've got NaNs in the original input data. If so, I would argue that this is the actual problem. If not, then there's a more subtle issue within clfd and we can try to fix that.

v-morello commented 8 months ago

Hi @msthomps, have you been able to check your input data and/or make some progress on this ?