gwastro / pycbc

Core package to analyze gravitational-wave data, find signals, and study their parameters. This package was used in the first direct detection of gravitational waves (GW150914), and is used in the ongoing analysis of LIGO/Virgo data.
http://pycbc.org
GNU General Public License v3.0
314 stars 351 forks source link

Add metadata to pycbc_inspiral for Pegasus to ingest #2959

Open duncan-brown opened 5 years ago

duncan-brown commented 5 years ago

Following the example given in https://jira.isi.edu/browse/PM-1398 we should add metadata to the stdout of pycbc_inspiral that Pegasus can ingest. The format is

{
    “ts”: 1437688574,
    “monitoring_event”: “metadata”,
    “payload”: [
      {
        “name”: “ncores”,
        “value” : 3
      },
      {
        “name”: “nfilters”,
        “value” : 3232
      }
      {
        “name”: “ntemplates”,
        “value” : 2323
      }
      {
        “name”: “run_time”,
        “value” : 2323
      }
      {
        “name”: “setup_time”,
        “value” : 2323
      }
     ]
}

where ts is the unix timestamp when the metadata is written, and the payload can be any pair of name/value entries.

@stevereyes01 it would be good to see what metadata about runtime @spxiwh is already generating and output that.

stevereyes01 commented 5 years ago

What is the best way to push info from pycbc_inspiral to stdout? Should I just write up a module that pycbc_inspiral calls to parse all of this into stdout? I would imagine that this might uglify anybody who wants to debug pycbc_inspiral though.

stevereyes01 commented 5 years ago

Probably better would be to make a different executable that takes the available information from the output hdf of pycbc_inspiral and pushes it out into the format that Grafana wants. Then we'll have to hook it into the workflow proper.

duncan-brown commented 5 years ago

First question is what is produced by event_mgr.save_performance () here: https://github.com/gwastro/pycbc/blob/master/bin/pycbc_inspiral#L443 defined here https://github.com/gwastro/pycbc/blob/master/pycbc/events/eventmgr.py#L371

duncan-brown commented 5 years ago

https://stackabuse.com/reading-and-writing-json-to-a-file-in-python/

stevereyes01 commented 5 years ago

This looks pretty doable. Thanks, I'll get started on this.

josh-willis commented 5 years ago

First question is what is produced by event_mgr.save_performance () here: https://github.com/gwastro/pycbc/blob/master/bin/pycbc_inspiral#L443 defined here https://github.com/gwastro/pycbc/blob/master/pycbc/events/eventmgr.py#L371

So the information saved there is then used to compute performance metrics that are saved into the output file under the search group in the HDF file. That's all written here: https://github.com/gwastro/pycbc/blob/master/pycbc/events/eventmgr.py#L497-L509 The main point being that what's interesting is not just the information that's saved by event_mgr.save_performance (), but what's calculated from that.

Can I ask what the context is? Is it mainly combining this information with the other information about machine type, etc, that pycbc_inspiral isn't directly aware of?

stevereyes01 commented 5 years ago

Hey @josh-willis , Duncan knows more about this than I do, but we would like to record more metadata of pycbc inspiral to put into a dashboard, so we can look at various runs in comparison and more.

josh-willis commented 5 years ago

@stevereyes01 OK. Just so you know, the information that is produced now is used in generating the plots in section 8.01, specifically in pycbc/bin/hdfcoinc/pycbc_plot_throughput here, in case you are interested in what's done now (which is part of the results page, I realize, and not a dashboard).

I also have open PR #2988 . The main point is that when trying to get an "average" templates per core across several runs, it's better to use the harmonic mean than just the regular mean. This has always been somewhat better, but really makes a big difference when doing injection runs, now that we are sorting the bank by mchirp. The regular mean is quite misleading in that case. So you might want your eventual dashboard to take that into account if comparing performance for different sites or hardware, for instance.

stevereyes01 commented 5 years ago

Thank you for the link. Yes I saw your PR on the harmonic mean. I'll take a look at all of the metadata that we're outputting and work with the dashboard folks to see what's most helpful for understanding how the runs perform across machines / cores, etc.