Closed GarethCabournDavies closed 1 month ago
For testing, see https://ldas-jobs.ligo.caltech.edu/~gareth.cabourndavies/testoutput/plotting_efficiency/testing/ (LVK paywall)
I forgot to include the timing / memory usage statistics in the original post
I tested against v2.3.6 (old) and v2.3.7 (with the fix from #4724 there):
pycbc_page_coinc_snrchi
:
v2.3.6:
User time (seconds): 100.76
System time (seconds): 53.77
Maximum resident set size (kbytes): 31977992
v2.3.7:
User time (seconds): 90.18
System time (seconds): 10.12
Maximum resident set size (kbytes): 1836028
This PR:
User time (seconds): 86.60
System time (seconds): 11.40
Maximum resident set size (kbytes): 1836968
pycbc_page_foreground
:
v2.3.6:
User time (seconds): 217.28
System time (seconds): 58.59
Maximum resident set size (kbytes): 23740496
v2.3.7:
User time (seconds): 46.72
System time (seconds): 7.64
Maximum resident set size (kbytes): 364108
This PR:
User time (seconds): 40.50
System time (seconds): 7.52
Maximum resident set size (kbytes): 366408
The
pycbc_page_coinc_snrchi
code was using a lot of memory - this fixes that This also fixes the memory usage ofget_snglfile_array_dict
inForegroundTriggers
Standard information about the request
This is an efficiency update, with a minor bugfix involved as well This change affects the offline search This change could change result presentation / plotting, but it doesn't outwith the bugfix
This change, follows style guidelines (See e.g. PEP8), has been proposed using the contribution guidelines
Motivation
The
pycbc_page_coinc_snrchi
code was using a lot of memory This also fixes the memory usage ofget_snglfile_array_dict
inForegroundTriggers
, as used bypycbc_page_foreground
There was a minor bug inpycbc_page_coinc_snrchi
not taking the -1 trigger ids into account properly - this has been fixedContents
Both these cases use the general idea of using boolean arrays to pre-cut the single-detector trigger files when reading. The
pycbc_page_coinc_snrchi
does this through the SingleDetTriggers interface.They both have additional complication that the boolean mask method loses order / duplicate index information. This is solved using the numpy
unique()
function'sreturn_inverse
keyword / output, which provides this information.I looked up whether there are faster / better implementations than the numpy unique function given that we don't use the sorted array, I found this implementation but it looks to be memory intensive, which is the problem we are solving here! It seems to be quick enough anyway
Links to any issues or associated PRs
https://github.com/gwastro/pycbc/issues/4761 https://github.com/gwastro/pycbc/pull/4724
Testing performed
Within the codes, I kept the original (as at version 2.3.6) implementations, and then did elementwise comparison of the outputs:
bkg_snr
,bkg_chisq
,inj_snr
andinj_chisq
in thepycbc_page_coinc_snrchi
code, andcurr
(andlgc
to be safe) inget_snglfile_array_dict
. These were all identical.I compared the output files with diff, and the page_foreground outputs differed by metadata. The
pycbc_page_coinc_snrchi
output plots differed, but this is due to the bugfix described above.