gwosc-tutorial / gwosc-catalog

Scripts and modules to be used in community catalog uploads to GWOSC website
0 stars 0 forks source link

Issues with Schema.json and/or ias-o3a.json Jun 25, 2024 #25

Closed DrKentB closed 4 months ago

DrKentB commented 4 months ago

I've begun "testing the waters" with this example json file. I've hit upon two issues: one being missing required metadata; and two being that that missing required metadata is insufficiently described in the schema (in my opinion as of recording this)...

1) the example JSON file is missing the "strain_channel", and as far as I can tell this is required (see Reference below)

2) the strain_channel can be different for each IFO(detector) - LIGO and Virgo do not use the same naming conventions and occasionally a custom channel has been used in the past for shoehorning in a detector's data, e.g., not in observing mode or has a glitch problem that was automatically vetoed.

3) The example file has an undocumented search attribute(key) labeled "search_statistics". This is not in the schema documentation referenced in my previous email. And in fact, if you look at the schema.json file for guidance you will see "parameters". This would make it consistent with what is being done in the "pe_sets" and hence should probably be adopted instead of "search_statistics", but that can be debated if others feel strongly one way or the other. However this comes down, the "Key Description" in the Reference has neither the "search_statistics" or the "parameters" (which is found in the schema.json file) documented and should be for completeness.

4) The use of sigfigs in GWOSC is not really sigfigs ... it is number of decimal places. The propagation of this pseudo-definition of sigfigs is an issue when involving non GWOSC team members. The ias-o3a.json file has adopted the true definition of sigfigs (which I think is the correct value to use in the schema definition) but this pseudo-definition require complex calculations to determine, e.g., when scientific notation is used (which has its own funny rules inside of GWOSC). It would be good to find a simpler solution for specifications vs implementation vs ingestion vs visualization. HERE IS AN EXAMPLE: {value in ias-o3a is "300" with sigfigs "1"} -> 300.0 in GWOSC if the sigfigs is not massaged by a rather crafty algorithm to address all ways to write numbers in different notations or by hand.

5) All events in the GWOSC event model have an associated "run", e.g., O3a, etc. I could imagine a community catalog that spans multiple runs under and the thus a unique "run" would not be possible for the whole catalog. Is this something that could cause an issue in the event portal? (This example doesn't bump into this problem, but I'm thinking about what is needed to make the schema.json complete.

6) Should all Community Catalog events have their own separate base "event" model instance or should the event version increment up from the internal GWOSC base event? I will assume that community catalogs use the same base "event" and have only a unique version for now, but worth thinking about. In addition, for Community Catalog Events that do not have a prior base "event" object a new one will be added. Hope this makes sense to others.

7) The preferred search result is not being specified. It is possible to have more than one search result in which case it is necessary to 'inform' which one is preferred, e.g., ...

      "search": [
        {
          "pipeline_name": "IAS",
          "search_statistics": [

needs to be:

      "search": [
        {
          "pipeline_name": "IAS",
          "preferred": true,            <--- This needs to be true for one and only one search in the list.
          "search_statistics": [

8) It would be good if there was documentation mapping the attributes in the schema.json file to where they will be visualized on the event web page, one particularly confusing example is the 'data_url' attribute that through various levels of translation shows up on the 'source file' link on the web page. (FOR INTERNAL GWOSC CONSIDERATION: the 'source file' is actually called 'source_location' in the ParameterSet Model - maybe for consistency, calling it data_url would be clearer).

9) the 'pe_sets', like in (7) above is missing an assignment of which one in the list is to be preferred, and again, only one can have preferred set to true:

      "pe_sets": [
        {
          "pe_set_name": "IASPrior",
          "data_url": "https://github.com/seth-olsen/new_BBH_mergers_O3a_IAS_pipeline/tree/main",
          "waveform_family": "IMRPhenomXPHM",
          "parameters": [

needs to be:

      "pe_sets": [
        {
          "pe_set_name": "IASPrior",
          "data_url": "https://github.com/seth-olsen/new_BBH_mergers_O3a_IAS_pipeline/tree/main",
          "waveform_family": "IMRPhenomXPHM",
          "preferred": true,         <--- This needs to be true for one and only one pe_set in the list.
          "parameters": [

10) The "event_description" can not be "null"! ... Our event model does not allow null as the character value. It does however allow for a blank value, i.e. set it to be "":

      ],
      "event_description": null      <--- NOT ALLOWED VALUE
    },
    {
      "event_name": "GW190707_083226",
      "gps": 1246523564.9,
      "detectors": [
        "H1",
        "L1"
      ],
      ],
      "event_description": ""    <---- BLANK is ALLOWED!!
    },
    {
      "event_name": "GW190707_083226",
      "gps": 1246523564.9,
      "detectors": [
        "H1",
        "L1"
      ],

11) In the example.json file provided by Javier, it looks like the usage of the "data_url" by the documentation (and the way it is used historically in GWOSC is incorrect. It also looks like the "pe_sets:[{"links":[{"url": "https://..."}]}] value is what should be in "data_url" 's value.

12) IN the shema.json file (https://github.com/gwosc-tutorial/gwosc-catalog/blob/main/schema.json#L38) the open square bracket is missing! I.e.,

          "pe_sets":
            {

should be:

          "pe_sets":[
            {

NOTE: Javier did catch this and corrected in his example.json file!

13) I found examples of sigfig being set to {0 and -1} when they should not be in the file ias-o3a.json. (see lines 707 and 519 in the file. I have reported this to Javier.

REFERENCE: https://github.com/gwosc-tutorial/gwosc-catalog/blob/main/README.md

martinberoiz commented 4 months ago

Regarding 5., it should be no problem. A catalog can have events from any run or across runs and it all should work.

  1. is a typo and it should be a separate ticket. I'll open one for this.
jroulet commented 4 months ago

Regarding 13., I found why sometimes I was getting negative significant figures.

The code looks at the position of the first nonzero digit of the median value, and that of the error (the smallest error, if the +/- values are different). The number of significant digits is obtained as the difference between the two positions, plus one. E.g. for 1234 +/- 56, the first digit happens at position -3 for the value and -1 for the error, -1 - (-3) + 1 = 3 significant figures ("123" in "1234" are significant).

If the median value is smaller than the error this logic can give negative significant figures. E.g. 1 +/- 56 would give 0 significant figures and 0.1 +/- 56 would give -1, etc. A fix is to put a minimum of 1 significant figure in the code, but before implementing this I wanted to ask if this is indeed the desired behavior. How exactly is sigfigs used downstream?