nvs-vocabs / ArgoVocabs

A repository for the management of issues related to vocabularies managed by the Argo Data Management Team
7 stars 0 forks source link

How to indicate multiple SENSOR or PARAMETER #81

Open apswong opened 8 months ago

apswong commented 8 months ago

This issue is somewhat related to R03 and R25:

When a float carries multiple sensors of the same type and thus generate multiple values for the same parameter, the current practice is to indicate the additional sensors/parameters by placing an integer "n" at the end of the char string. For example, if a float carries three oxygen sensors, then the three parameters are DOXY, DOXY2, DOXY3.

This practice is problematic when the char string ends in a numeral, e.g. DOWN_IRRADIANCE412.

What are the possible solutions? Many old files have already been written with the old practice.

HCBScienceProducts commented 8 months ago

This issue is somewhat related to R03 and R25:

When a float carries multiple sensors of the same type and thus generate multiple values for the same parameter, the current practice is to indicate the additional sensors/parameters by placing an integer "n" at the end of the char string. For example, if a float carries three oxygen sensors, then the three parameters are DOXY, DOXY2, DOXY3.

This practice is problematic when the char string ends in a numeral, e.g. DOWN_IRRADIANCE412.

What are the possible solutions? Many old files have already been written with the old practice.

When the parameter string ends with a numeral, e.g., BBP, then the integer "n" at the end is separated by an underscore. E.g. BBP700 for the first and BBP700_2 for a second BBP700 parameter.

HCBScienceProducts commented 8 months ago

Examples are CSIRO floats 1901348 and 5905165/5905395 with both ECO_BB3 and MCOMS_FLBBCD sensors. Each sensor has a BBP700 channel.

SBS-EREHM commented 8 months ago

FYI, this issue is a duplicate of https://github.com/nvs-vocabs/R25/issues/1

@HCBScienceProducts' underscore trick works, I think, as long as we agree to never ever make a SENSOR (sensor type) that ends in \<nnn>, i.e., the \<nnn> suffix is reserved for identical sensor instance nnn >= 2.

vpaba commented 2 months ago

@HCBScienceProducts should this reccomendation (for duplicate SENSOR) be added to the WIP manual, if not there already? @mscanderbeg

Otherwise, the issue can be closed

apswong commented 2 months ago

@vpaba @tcarval This recommendation: use an underscore to separate the integer n when the parameter string ends with a number, needs to be agreed on, then enter into the Argo Users Manual and the GDAC File Checker.

richardsc commented 2 months ago

With full acknowledgment that what I'm about to suggest could have retroactive consequences for all previous parameters that didn't end with a numeral, for consistency of implementation going forward the proposal could be to always use an underscore to separate additional sensors, e.g. BBP700_2 or DOXY_2. This eliminates the uncertainty over whether a numeral is part of the parameter name or not (but would potentially require a lot of changing of old names ...).

apswong commented 2 months ago

I agree with @richardsc. I prefer Clark's solution that from now on, going forward, that the new naming convention is to always use an underscore to separate additional sensors. Old files that contain the old naming convention (without the underscore) can remain on the GDACs, but new files should use the new naming convention from now on.

SBS-EREHM commented 2 months ago

With full acknowledgment that what I'm about to suggest could have retroactive consequences for all previous parameters that didn't end with a numeral, for consistency of implementation going forward the proposal could be to always use an underscore to separate additional sensors, e.g. BBP700_2 or DOXY_2. This eliminates the uncertainty over whether a numeral is part of the parameter name or not (but would potentially require a lot of changing of old names ...).

I completely agree with @richardsc. Currently, in the proposed JSON metadata schema,NVS controlled vocabulary terms (including those for SENSOR or PARAMETER) are specifed by SDN URI, e.g., SDN::R25:CTD_PRES or SDN:R03::PRES. Parsing and validating a controlled terms with repeated SENSORS or PARAMETERS becomes problematic. when there are multiple instances Imagine a float deployed with both MCOMS and RBR Tridente FLBBCD sensors with BBP532. In the current scheme we would have:

To properly validate the second sensor or paremater's controlled term for sensor or parameter, we need to

  1. Understand that this is a sensor with a wavelength in it's controlled term
  2. Parse the URI for wavelength
  3. Have "inside knowledge" that the wavelength is only 3 digits
  4. then remove the 4th digit
  5. Than validate the controlled term (SDN::R25BACKSCATTERINGMETER_BBP532, SDN:R03::BETA_BACKSCATTERING700) using the NVS controlled vocabulary

@richardsc propsal would make it a bit easier, assuming there is NEVER EVER an NVS controlled term that ends in _N where N=2,3,4,5,...

This is still problematic. Many R25 and R03 entries have an underscore, so the parsing rule has to look for a final underscore followed by a (single?) digit, e.g., a messy regular expression match* with (in this case) the first capture group capturing the controlled term and the fourth capture group capturing the instance number :

^SDN::R25:((?:([A-Z]+[0-9]*)*(?:_[A-Z]+[0-9]*)))?(_(\d))?$

I propose something easier. We use a double underscore __N that easily and unambiguously delimits the NVS controlled term (SDN::R25BACKSCATTERINGMETER_BBP532, SDN:R03::BETA_BACKSCATTERING700) from the instance number (N). So I can simply look for a double underscore.

If no double underscore, the NVS term is ready to go:

If double underscore is found, then parse all to the left as the NVS controlled term and all to the right as the instance number

Comments?

HCBScienceProducts commented 2 months ago

@SBS-EREHM: Please check on the previous posts. The current scheme is to add an underscore if the name ends in a numeral, so your given examples aren't correct.

The current practice is that:

When a float carries multiple sensors of the same type and thus generate multiple values for the same parameter, the additional sensors/parameters are indicated by placing an integer "n" at the end of the char string. For example, if a float carries three oxygen sensors, then the three parameters are DOXY, DOXY2, DOXY3.

When the parameter string ends with a numeral because the parameter string contains a wavelength indication \, e.g., BBP\, then the integer "n" at the end is separated by an underscore. E.g. BBP700 for the first and BBP700_2 for a second BBP700 parameter.

(to be agreed on or modified by the current discussion)

Parsing the current practice is (unambiguously) feasible: By starting from the back :-)

  • How many numerals are at the end of the string?
    • 0 -> standard parameter name, e.g., DOXY
    • 1+ -> Are they preceded by a single underscore?
      • Yes -> numerals indicate the parameter number; everything before the underscore is a standard parameter name (with a wavelength \ at the end), e.g., BBP700_2
      • No -> Is it less than 3 numerical digits?
      • Yes -> numerals indicate the parameter number; everything before the numerals is a standard parameter name (without a wavelength \ at the end), e.g., DOXY2
      • No -> numerals indicate a wavelength \; everything is part of a standard parameter name, e.g., BBP700

This parsing implicitly assumes for the current practice that:

  1. Parameter names end with numerals only to indicate a wavelength \, which is always 3(+) numerical digits
  2. There is max. 99 replicate parameter names for parameters that don't end with a numeral, i.e., DOXY, DOXY2, DOXY3, ... DOXY99

Number (1.) is presently true. Number (2.) seems a reasonable assumption to me.

@apswong: While I see the merit of @richardsc's suggestion to simplify the above sketched decision tree by always adding a single underscore, I would prefer a solution that gives a consistent Argo data set all over. Not one that has different practices for different floats in time. So I'd be in favour to formalize and keep the current practice, or to opt for @richardsc's suggestion to always ad an underscore but then require existing files to be modified accordingly.

I also want to remind that there are CSIRO floats 1901348 and 5905165/5905395 with both ECO_BB3 and MCOMS_FLBBCD sensors. Each sensor has a BBP700 channel. To have an example of how things look like (and work like) at present.

Examples from the current 1901348_meta.nc file (highlights in bold are by me):

PARAMETER = "TEMP ", "PSAL ", "PRES ", "DOXY ", "TEMP_DOXY ", "PHASE_DELAY_DOXY ", "TEMP_VOLTAGE_DOXY ", "CHLA ", "FLUORESCENCE_CHLA ", "BBP700 ", "BETA_BACKSCATTERING700 ", "TEMP_CPU_CHLA ", "FLUORESCENCE_CDOM ", "CDOM ", "TRANSMITTANCE_PARTICLE_BEAM_ATTENUATION660 ", "CP660 ", "BBP700_2 ", "BETA_BACKSCATTERING700_2 ", "BBP532 ", "BETA_BACKSCATTERING532 ", "BBP470 ", "BETA_BACKSCATTERING470 ", "DOWN_IRRADIANCE412 ", "RAW_DOWNWELLING_IRRADIANCE412 ", "DOWN_IRRADIANCE443 ", "RAW_DOWNWELLING_IRRADIANCE443 ", "DOWN_IRRADIANCE490 ", "RAW_DOWNWELLING_IRRADIANCE490 ", "DOWN_IRRADIANCE555 ", "RAW_DOWNWELLING_IRRADIANCE555 ", "UP_RADIANCE412 ", "RAW_UPWELLING_RADIANCE412 ", "UP_RADIANCE443 ", "RAW_UPWELLING_RADIANCE443 ", "UP_RADIANCE490 ", "RAW_UPWELLING_RADIANCE490 ", "UP_RADIANCE555 ", "RAW_UPWELLING_RADIANCE555 " ; }

and correspondingly

PARAMETER_SENSOR = "CTD_TEMP ", "CTD_CNDC ", "CTD_PRES ", "OPTODE_DOXY ", "OPTODE_DOXY ", "OPTODE_DOXY ", "OPTODE_DOXY ", "FLUOROMETER_CHLA ", "FLUOROMETER_CHLA ", "BACKSCATTERINGMETER_BBP700 ", "BACKSCATTERINGMETER_BBP700 ", "BACKSCATTERINGMETER_BBP700 ", "FLUOROMETER_CDOM ", "FLUOROMETER_CDOM ", "TRANSMISSOMETER_CP660 ", "TRANSMISSOMETER_CP660 ", "BACKSCATTERINGMETER_BBP700_2 ", "BACKSCATTERINGMETER_BBP700_2 ", "BACKSCATTERINGMETER_BBP532 ", "BACKSCATTERINGMETER_BBP532 ", "BACKSCATTERINGMETER_BBP470 ", "BACKSCATTERINGMETER_BBP470 ", "RADIOMETER_DOWN_IRR412 ", "RADIOMETER_DOWN_IRR412 ", "RADIOMETER_DOWN_IRR443 ", "RADIOMETER_DOWN_IRR443 ", "RADIOMETER_DOWN_IRR490 ", "RADIOMETER_DOWN_IRR490 ", "RADIOMETER_DOWN_IRR555 ", "RADIOMETER_DOWN_IRR555 ", "RADIOMETER_UP_RAD ", "RADIOMETER_UP_RAD412 ", "RADIOMETER_UP_RAD443 ", "RADIOMETER_UP_RAD443 ", "RADIOMETER_UP_RAD490 ", "RADIOMETER_UP_RAD490 ", "RADIOMETER_UP_RAD555 ", "RADIOMETER_UP_RAD555 " ; }

(There's a cookie to win for the first one to find the one entry that's off in the 1901348 meta entries ;-) )