oceansites / dmt

Activities of the OceanSITES Data Management Team
http://www.oceansites.org/data
6 stars 1 forks source link

Supply JCOMMOPS with metadata to improve OS metrics #51

Open ngalbraith opened 4 years ago

ngalbraith commented 4 years ago

The JCOMMOPS data portal (JDP) is built on a database that uses a schema that is very unlike the OceanSITES NetCDF specification.

Making use of JDP is not driven by the science behind OceanSITES but by the funding/visibility and promoting the usefulness of OceanSITES as an integral part of the global ocean observing effort. For this reason the DMT will propose modifications or additions to our format specification, to allow us to be fairly represented on the JDP.

Different versions of the request for new/changed metadata fields from have been provided:

JCOMMOPS metadata request, aka NEW REQUIREMENTS document

TranslatingJCOMMOPS2OceanSITES_3.docx

1: Sensor/Instrument metadata fields

Note that in the above documents, the term sensor actually represents an instrument. This means that we need to provide metadata on, e.g. an SBE-37IM, not on its T or C sensors. Because we have a lot of 2 dimensional data (temperature at various depths in one file), we will (probably) need to implement the new metadata fields as variables, not as attributes. This method is already in our DFRM, and we'll just add some fields.

Sample CDL for a temperature record, with new JDP fields:

double TEMP(TIME, DEPTH) ; TEMP:instrument = "T_INST" ;

int T_INST ; T_INST:long_name = "instruments" ; T_INST:ancillary_variables = "INST_MFGR INST_MOD INST_SN INST_URL INST_MOUNT" ; char INST_SeaVoX_L22_code(DEPTH, strlen2) ; NEW!!! INST_SDNcode:long_name = "SeaDataNet code"; char INST_MFGR(DEPTH, strlen1) ; INST_MFGR:long_name = "instrument manufacturer" ; char INST_MODEL(DEPTH, strlen2) ; INST_MODEL:long_name = "instrument model name" ; int INST_SN(DEPTH) ; INST_SN:long_name = "instrument serial number" ; char INST_URL(DEPTH, strlen3) ; INST_URL:long_name = "instrument reference URL" ; char INST_MOUNT(DEPTH, strlen3) ; INST_MOUNT:longname = "instrument mount" ; data: INST = ; (an empty variable, aka an umbrella) INST_MFGR = "RBR-Global ", "Sea-Bird Electronics", "Sea-Bird Electronics" ; INST_MODEL = "RBR TR-1060P", "Sea-Bird SBE 37-SI", "Sea-Bird SBE 16"; INST_SeaVoX_L22_code = "TOOL0728", “TOOL0021", "TOOL0023" ; INST_MOUNT = “mounted_on_surface_buoy”, “mounted_on_mooring_line”, “mounted_on_seafloor_structure_riser"; INST_SN = 14875, 1325, 1328; INST_URL = (note: these will actually be URLs on the JDP) "http://www.rbr-global.com/products/tr-1060-temperature", "http://www.seabird.com/products/spec_sheets/37smdata.htm", "http://www.seabird.com/16plus_ReferenceSheet.pdf" ;

ngalbraith commented 4 years ago
  1. Deployment and recovery information

JDP wants explicit ship/cruise information to help support ship operators. This meetadata also provides access, in some cases, to related shipboard data.

There are different descriptions of the expocode on line. See https://exchange-format.readthedocs.io/en/latest/parameters.html#expocode "Usual generation formula is ICES 4 character platform code then the cruise depature date in YYYYMMDD format"

Note: the departure date is not always readily available! I have not found a server that can provide these, though knowing the ices ship code and the deployment date should enable these to be found. I propose we continue to use the cruise names generated by the ships' operators, because they can be used to find relevant data (e,g, on the R2R site, https://www.rvdata.us/)

attribute example description
platform_deployment_date ”2006-03-01T00:00:00Z” Date and time in ISO format of the deployment of the buoy or other platform (JCOMMOPS)
platform_deployment_ship_ICES_code ’318M’ Codes at https://ocean.ices.dk/codes/ShipCodes.aspx
platform_deployment_cruise_name R/V Melville TUIM10MV  cruise name assigned by the ship operator
platform_deployment_cruise_Expocode 318M20060222 ICES ship name plus cruise start date
platform_recovery_date ”2007-03-01T00:00:00Z” Date and time in ISO format of the recovery of the buoy or other platform (JCOMMOPS)
platform_recovery_ship_ICES_code ’318M’ Codes at https://ocean.ices.dk/codes/ShipCodes.aspx
platform_recovery_cruise_name R/V Melville TUIM22MV the name assigned by the ship operator
platform_recovery_cruise_Expocode 318M20070225 ICES ship name plus cruise start date
petejan commented 4 years ago

I have been using the variable::attribute approach to recording sensor information, because I find it easier to read and explain to data end users. Where I have multiple instances (depths) of instruments recording say sea_water_temperature, then I separate these with ; so in your example TEMP:sensor_model = "TR1060 ; SBE37 ; SBE16 "; Although using sensor_model there is no real connection between the DEPTH and the sensor serial number, apart from an implied ordering, which is one disadvantage of this method.

As for matching the sea_water_temperature and sea_water_practical_salinity sensors, matching the make and serial number attributes should be easy using a database approach when reading them into the jcommobs structures.

For OceanSITES netCDF files without the complete sensor information or attributes, I think jcommobs should just use the standard_name as the (generic) sensor information, I don't see this as a disadvantage for the funding/visibility objectives of jcommobs.


From: Nan Galbraith notifications@github.com Sent: Saturday, 21 September 2019 4:24 AM To: oceansites/dmt dmt@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [oceansites/dmt] Supply JCOMMOPS with metadata to improve OS metrics (#51)

The JCOMMOPS data portal (JDP) is built on a database that uses a schema that is very unlike the OceanSITES NetCDF specification.

Making use of JDP is not driven by the science behind OceanSITES but by the funding/visibility and promoting the usefulness of OceanSITES as an integral part of the global ocean observing effort. For this reason the DMT will propose modifications or additions to our format specification, to allow us to be fairly represented on the JDP.

Different versions of the requested metadata fields from have been provided:

JCOMMOPS metadata request, aka NEW REQUIREMENTS documenthttps://drive.google.com/file/d/1xOPKw2Vb20yrCyUCkXhIH5mGdRJoQN6m/view?usp=sharing

TranslatingJCOMMOPS2OceanSITES_3.docxhttps://github.com/oceansites/dmt/files/3606673/TranslatingJCOMMOPS2OceanSITES_3.docx

Note that in the above documents, the term sensor actually represents an instrument. This means that we need to provide metadata on, e.g. an SBE-37IM, not on its T or C sensors. Because we have a lot of 2 dimensional data (temperature at various depths in one file), we will (probably) need to implement the new metadata fields as variables, not as attributes. This method is already in our DFRMhttp://www.oceansites.org/docs/oceansites_data_format_reference_manual.pdf, and we'll just add some fields. Sample CDL for a temperature record:

double TEMP(TIME, DEPTH) ; TEMP:instrument = "T_INST" ;

int T_INST ; T_INST:long_name = "instruments" ; T_INST:ancillary_variables = "INST_MFGR INST_MOD INST_SN INST_URL INST_MOUNT" ; char INST_MFGR(DEPTH, strlen1) ; INST_MFGR:long_name = "instrument manufacturer" ; char INST_MODEL(DEPTH, strlen2) ; INST_MODEL:long_name = "instrument model name" ; int INST_SN(DEPTH) ; INST_SN:long_name = "instrument serial number" ; char INST_URL(DEPTH, strlen3) ; INST_URL:long_name = "instrument reference URL" ; char INST_MOUNT(DEPTH, strlen3) ; INST_MOUNT:longname = "instrument mount" ; data: INST = ; (an empty variable, aka an umbrella) INST_MFGR = "RBR-Global ", "Seabird Electronics", "Seabird Electronics" ; INST_MODEL = "TR1060", "SBE37 ", "SBE16 "; INST_MOUNT = “mounted_on_surface_buoy”, “mounted_on_mooring_line”, “mounted_on_seafloor_structure_riser"; INST_SN = 14875, 1325, 1328; INST_URL = "http://www.rbr-global.com/products/tr-1060-temperature", "http://www.seabird.com/products/spec_sheets/37smdata.htm", "http://www.seabird.com/16plus_ReferenceSheet.pdf" ;

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/oceansites/dmt/issues/51?email_source=notifications&email_token=AAFQXTWORGWCQY3M3IEO3K3QKUIMPA5CNFSM4IYZ7XR2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HMXZSUA, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAFQXTTAUJQUFWLGJXLWMQLQKUIMPANCNFSM4IYZ7XRQ.

hmsnaith commented 4 years ago

Hi all

I also prefer this approach, using the variable attribute as I find it much clearer when reading the files. The separate variable approach I find quite confusing

Helen Dr Helen Snaith

British Oceanographic Data Centre, National Oceanography Centre, Southampton SO14 3ZH Tel: +44 2380 596410 h.snaith@bodc.ac.ukmailto:h.snaith@bodc.ac.uk

On 23 Sep 2019, at 00:31, Peter Jansen notifications@github.com<mailto:notifications@github.com> wrote:

I have been using the variable::attribute approach to recording sensor information, because I find it easier to read and explain to data end users. Where I have multiple instances (depths) of instruments recording say sea_water_temperature, then I separate these with ; so in your example TEMP:sensor_model = "TR1060 ; SBE37 ; SBE16 "; Although using sensor_model there is no real connection between the DEPTH and the sensor serial number, apart from an implied ordering, which is one disadvantage of this method.

As for matching the sea_water_temperature and sea_water_practical_salinity sensors, matching the make and serial number attributes should be easy using a database approach when reading them into the jcommobs structures.

For OceanSITES netCDF files without the complete sensor information or attributes, I think jcommobs should just use the standard_name as the (generic) sensor information, I don't see this as a disadvantage for the funding/visibility objectives of jcommobs.


From: Nan Galbraith notifications@github.com<mailto:notifications@github.com> Sent: Saturday, 21 September 2019 4:24 AM To: oceansites/dmt dmt@noreply.github.com<mailto:dmt@noreply.github.com> Cc: Subscribed subscribed@noreply.github.com<mailto:subscribed@noreply.github.com> Subject: [oceansites/dmt] Supply JCOMMOPS with metadata to improve OS metrics (#51)

The JCOMMOPS data portal (JDP) is built on a database that uses a schema that is very unlike the OceanSITES NetCDF specification.

Making use of JDP is not driven by the science behind OceanSITES but by the funding/visibility and promoting the usefulness of OceanSITES as an integral part of the global ocean observing effort. For this reason the DMT will propose modifications or additions to our format specification, to allow us to be fairly represented on the JDP.

Different versions of the requested metadata fields from have been provided:

JCOMMOPS metadata request, aka NEW REQUIREMENTS documenthttps://drive.google.com/file/d/1xOPKw2Vb20yrCyUCkXhIH5mGdRJoQN6m/view?usp=sharing

TranslatingJCOMMOPS2OceanSITES_3.docxhttps://github.com/oceansites/dmt/files/3606673/TranslatingJCOMMOPS2OceanSITES_3.docx

Note that in the above documents, the term sensor actually represents an instrument. This means that we need to provide metadata on, e.g. an SBE-37IM, not on its T or C sensors. Because we have a lot of 2 dimensional data (temperature at various depths in one file), we will (probably) need to implement the new metadata fields as variables, not as attributes. This method is already in our DFRMhttp://www.oceansites.org/docs/oceansites_data_format_reference_manual.pdf, and we'll just add some fields. Sample CDL for a temperature record:

double TEMP(TIME, DEPTH) ; TEMP:instrument = "T_INST" ;

int T_INST ; T_INST:long_name = "instruments" ; T_INST:ancillary_variables = "INST_MFGR INST_MOD INST_SN INST_URL INST_MOUNT" ; char INST_MFGR(DEPTH, strlen1) ; INST_MFGR:long_name = "instrument manufacturer" ; char INST_MODEL(DEPTH, strlen2) ; INST_MODEL:long_name = "instrument model name" ; int INST_SN(DEPTH) ; INST_SN:long_name = "instrument serial number" ; char INST_URL(DEPTH, strlen3) ; INST_URL:long_name = "instrument reference URL" ; char INST_MOUNT(DEPTH, strlen3) ; INST_MOUNT:longname = "instrument mount" ; data: INST = ; (an empty variable, aka an umbrella) INST_MFGR = "RBR-Global ", "Seabird Electronics", "Seabird Electronics" ; INST_MODEL = "TR1060", "SBE37 ", "SBE16 "; INST_MOUNT = “mounted_on_surface_buoy”, “mounted_on_mooring_line”, “mounted_on_seafloor_structure_riser"; INST_SN = 14875, 1325, 1328; INST_URL = "http://www.rbr-global.com/products/tr-1060-temperature", "http://www.seabird.com/products/spec_sheets/37smdata.htm", "http://www.seabird.com/16plus_ReferenceSheet.pdf" ;

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/oceansites/dmt/issues/51?email_source=notifications&email_token=AAFQXTWORGWCQY3M3IEO3K3QKUIMPA5CNFSM4IYZ7XR2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HMXZSUA, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAFQXTTAUJQUFWLGJXLWMQLQKUIMPANCNFSM4IYZ7XRQ.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/oceansites/dmt/issues/51?email_source=notifications&email_token=ABXHKVYEICBJ47AOJSXHR2LQK755FA5CNFSM4IYZ7XR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7JRTPI#issuecomment-533928381, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABXHKVY2JJEO7XRM6YVWMFLQK755FANCNFSM4IYZ7XRQ.

This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UK Research and Innovation does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses. Opinions, conclusions or other information in this message and attachments that are not related directly to UK Research and Innovation business are solely those of the author and do not represent the views of UK Research and Innovation.

ngalbraith commented 4 years ago

Using semi-colon separated lists is fine, but when you get up to about 20 instruments, it can be very hard to parse. Also, in some cases of met data, where we deploy redundant instruments, we patch in data from more than one sensor; this would not be very easy to do (clearly) with the list approach. As far as I know, we're fine having 2 different ways to document instruments in our format.

I can imagine a script that could run on the thredds server that could query our existing NetCDF files; for files that don't contain instrument info, but have seawater temperature, it could insert a 'default' instrument make/model from the BODC/SDN list - something like 'generic temperature recorder', or if S is also present, 'generic T/S recorder'. The output could be fed into the database, eliminating the need to add this info to our NetCDF files. Of course having this metadata explicitly stated in the NetCDF is preferable, but this seems like a reasonable "80% solution".

ngalbraith commented 4 years ago

One more point - the concept of using a variable to contain instrument information isn't ours, we adopted it from the US NCEI (formerly NODC). They introduced the idea when they were developing their NetCDF templates, as far as I know.

ngalbraith commented 4 years ago

For OceanSITES netCDF files without the complete sensor information or attributes, I think jcommobs should just use the standard_name as the (generic) sensor information

This is a really good idea, Helen. It would be fairly easy to generate the missing information, using a script to see what data variables are in each file.