wmo-im / wmds

WIGOS Metadata Standard: Semantic standard and code tables
16 stars 22 forks source link

Discussion on Gas phase variables #261

Closed rhornbrook closed 3 years ago

rhornbrook commented 3 years ago

Branch [add when created]

Summary and Purpose Discussion of the variables and vocabulary of the Atmosphere/Gas/ subdomains.

Stakeholder(s) WG ACV

Proposal Clarify some broad questions regarding the vocabulary.

Reason Just starting the discussion

rhornbrook commented 3 years ago

Questions for the team: 1) Are non-standard alphanumeric characters acceptable for variable names? i.e., parentheses, commas, dashes, superscripts, subscripts 2) Are upper and lower case distinguishable and required? 3) What is the reasoning behind the "notation" order of species? It seems very random. 4) Is there a preference between common names vs. IUPAC names? What about the more rare names? i.e., MEK or Butanone? isobutane or 2-methylpropane? cis-/trans- or (E)-/(Z)- notation? 5) What is the philosophy behind subdomain_2 (i.e., Reactive Gas, ozone, Greenhouse Gas, Other Gas, etc.) There may be overlap between species that fit into multiple subdomains, so I would lean towards not including that category.

rhornbrook commented 3 years ago

Regarding identification of variables, what, if any, of the following can be included in the table? a. Notation b. Path c. Chemical formula d. Structural formula e. Molecular Weight f. Name g. Other name h. CAS Registry Number i. unique species or mixture of species including: i. Different chemical formulae? ii. Structural Isomers? (i.e., 1-butene and isobutene) iii. Stereoisomers? (i.e., D-/L-/R/+-) j. IUPAC Standard InChI k. IUPAC Standard InChIKey

fstuerzl commented 3 years ago

Thanks for opening this issue, @rhornbrook. I try to answer some of your questions or comment on them:

  1. In the variable name (label) and the description are upper and lower case are distinguishable.

  2. The notation is the unique identifier of a variable (or any element in a code list), in this case it corresponds to the variable id that is used in the OSCAR/Surface database. That means the number may be higher the later the variable was added. But semantically the notation has no other meaning than to distinguish entries in the table.

  3. We had a similar discussion in an issue for the last fast track (#213), in which @gaochen-larc and @joergklausen also commented on this topic in general:

    One general question: do we really need the chemical formula before the parenthesis? As shown in the table, one formula can represent multiple molecules. The content in the parenthesis actually identifies the compound. We may need to discuss this topic at little bit. Also, should we just put one name in the name column and put the alias in the description or we can create another column for alias...

Chemical formulas are included to finding related chemicals in an ordered list. I would also recommend to stay with the current use of more than one name. These are common names that people relate to. It's all in an effort to provide assistance to users wanting to find a specific molecule.

  1. I think, the grouping in subdomain_2 is based on the organisation of GAW. Ozone, Greenhouse Gases and Reactive Gases are three of the focal areas listed in the GAW implementation plan. But that does not mean, the categories are fixed and can't be changed (@joergklausen ?).
gaochen-larc commented 3 years ago

Assuming Reactive Gases' definition is based on chemical lifetime. This is difficult to define since the chemical lifetimes of reactive species are spatially and temporally variable.

jbnowak-larc commented 3 years ago

Related to @fstuerzl's response to 4, I noticed that ammonia is the only gas phase species where the chemical formula is in parenthesis, not the name. Is this a mistake?

I, also, agree that subdomain_2 is problematic because of overlap and interpretative nature of the categories.

dkubisti commented 3 years ago

Harmonisation of the subdomains: Currently, the different levels of the subdomain are not treated uniformly, e.g. for Reactive Gases the following levels are either another gas group or the species itself (Reactive Gas/VOC/C2H2 vs Reactive Gas/ROOH).

I agree to the comments above that the first subdomain within \Atmosphere\Gas\ (e.g. Greenhouse Gas, Ozone, Reactive Gas,...) seems "arbitrary" the structure seems historically grown within the GAW Focus Group. In any case any audience with its perspective would wish for a different substructure. Therefore the substructure can be defined more on a functional group level, maybe as flat as possible, like what is often done in numerical models, and parallel each species should be allowed to have metadata that allows different views on the 'zoo' of trace gases. I like the idea of r.hornbrook to assign to each species

A possible structure could be based on the atoms in the gas, e.g. the structure could be in hierarchical order gases containing O,H,N,C,Halogens, Sulfur, Mercury,..

Then additional to the attributes that R.Hornbrook specified in the metadata table, it should also contain, attributes often used like "ReactiveGases" or "GreenhouseGases" so that someone who looks for GreenhouseGases get all species considered to be climate active trace gases and someone looking for the traditional "reactiveGases", would find his historical set of species.

gaochen-larc commented 3 years ago

Harmonisation of the subdomains: Currently, the different levels of the subdomain are not treated uniformly, e.g. for Reactive Gases the following levels are either another gas group or the species itself (Reactive Gas/VOC/C2H2 vs Reactive Gas/ROOH).

I agree to the comments above that the first subdomain within \Atmosphere\Gas\ (e.g. Greenhouse Gas, Ozone, Reactive Gas,...) seems "arbitrary" the structure seems historically grown within the GAW Focus Group. In any case any audience with its perspective would wish for a different substructure. Therefore the substructure can be defined more on a functional group level, maybe as flat as possible, like what is often done in numerical models, and parallel each species should be allowed to have metadata that allows different views on the 'zoo' of trace gases. I like the idea of r.hornbrook to assign to each species

A possible structure could be based on the atoms in the gas, e.g. the structure could be in hierarchical order gases containing O,H,N,C,Halogens, Sulfur, Mercury,..

Then additional to the attributes that R.Hornbrook specified in the metadata table, it should also contain, attributes often used like "ReactiveGases" or "GreenhouseGases" so that someone who looks for GreenhouseGases get all species considered to be climate active trace gases and someone looking for the traditional "reactiveGases", would find his historical set of species.

I generally agree @dkubisti's comments. I think the functional group approach will work, like what @dkubisti mentioned. Certainly there will be details needed to be worked out. I also like the idea that one species can have multiple tags, like "GreenhouseGases" and "ReactiveGases", maybe even put tags for various tracers...

rhornbrook commented 3 years ago

A possible structure could be based on the atoms in the gas, e.g. the structure could be in hierarchical order gases containing O,H,N,C,Halogens, Sulfur, Mercury,..

I think this makes sense. Allow the gases to sort themselves based on their atomic composition. It will, of course, begin to get complicated once we are dealing with molecules with multiple atom types.

Then additional to the attributes that R.Hornbrook specified in the metadata table, it should also contain, attributes often used like "ReactiveGases" or "GreenhouseGases" so that someone who looks for GreenhouseGases get all species considered to be climate active trace gases and someone looking for the traditional "reactiveGases", would find his historical set of species.

I'm less convinced that this is necessary. Are the vocabulary tables also to be used as a teaching and research tool? Or are we merely providing a database of identified observations? Many of the attributes are merely applied knowledge. If, however, they can be added like a series of tags, then perhaps that would be acceptable, but it would still require that we apply prior or learned knowledge, which may become tedious with hundreds of gases.

fstuerzl commented 3 years ago

Descriptions (IUPAC name, PubChem CID, CAS number) added to gas phase variables: https://github.com/wmo-im/wmds/compare/issue261 -> moved to separate issue: see #284

fstuerzl commented 3 years ago

Please review the following list of tags related to gas phase variables and add terms, if necessary:

sebvi commented 3 years ago

Should we add functional groups (Ketones, Aldehydes, Terpenes, etc.)? It is useful to perform a search through the list based on chemical groups or class of compounds: "select all terpenes", etc.

rhornbrook commented 3 years ago

I agree. We could add any or all of the following:

aromatic aldehyde ketone ether ester alcohol acid (which includes species like HNO3, HCl, H2SO4, HCOOH, acetic acid, etc.)* terpene (alternatively, monoterpene, sesquiterpene, etc.?) nitrate nitrile OVOC (another name for "oxygen containing compound") BVOC

*alternatively, we could have inorganic acids and organic acids, and separate them based on the presence or absence of carbon.

I'm not a fan of "Reactive Gas", which is... all gases, depending on the subject, and thus not exactly useful.

gaochen-larc commented 3 years ago

Please review the following list of tags related to gas phase variables and add terms, if necessary:

  • Greenhouse Gas
  • Reactive Gas
  • VOC
  • Alkane
  • Alkene
  • Alkyne
  • POP
  • PAH
  • CFC
  • HCFC
  • HFC
  • PFC
  • Halocarbon
  • Halon
  • Oxygen containing compound
  • Hydrogen containing compound
  • Nitrogen containing compound
  • Carbon containing compound
  • Sulphur containing compound

I see some overlaps among these tags. Should we have an hierarchy, like voc/alkane? "Carbon containing compound" is very broad it includes VOC, CFC, HCFC... Should we have it? I would like to suggest to have a discussion about these tags. My current thinking is to have a minimum number of tags but inclusive...

rhornbrook commented 3 years ago

My understanding is that the transition to "tags" from the previous hierarchy system is that we can make allowances for chemical species to have multiple tags. Even the previous suggestions included halocarbons and CFCs, and while CFCs are entirely within halocarbons, but not all halocarbons are CFCs. Presumably, someone might search for specifically halocarbons or specifically CFCs, which will include different results. I'm happy to discuss further, though.

jbnowak-larc commented 3 years ago

I had the same understanding that multiple "tags" would make it easier to find a chemical species for people with different backgrounds. Yes, a broad "tag" such as carbon containing compound has overlap but that is necessary to keep species from falling through the cracks. I, also, feel that reactive gas is problematic due to difficulty defining.

joergklausen commented 3 years ago

The whole point of assigning tags is to allow several of them, the fewer the better, but as many as needed. In setting up the revised variable code list, a very thorough tagging is needed: each variable needs to be assessed against each available tag, and this excercise needs to be repeated for each new tag someone may come up. That's quite a lot of work that someone (you!) will have to do. So, I suggest to start out with a small number of tags.

I tend to agree that 'reactive gas' doesn't add much value - it's been around for historical reasons, but I think should be scrapped.

I was tempted to suggest 'organic' and 'inorganic' as very fundamental tags but I am hesitant now ... does this add value?

I would suggest to simplify tags like 'sulphur containing compound' to 'sulphur' or 'S species' or similar. Any objection?

joergklausen commented 3 years ago

See https://github.com/wmo-im/wmds/wiki/Tags-for-observed-variables for discussion.

dkubisti commented 3 years ago

Thank for the discussion and the comprehensive gas table list. I support to add the functional groups as additional tags. I suggest to add two more groups: radicals peroxides

I agree that these tags do not need a hierarchical structure. Major aim is to be findable for all user groups. I am consent that the tag Reactive Gas is not very helpful.

I like Joerg's suggestion to abbreviate the e.g. sulphur containing compounds.

rhornbrook commented 3 years ago

Perhaps we can also add "isotopologue"? Or something to define "specified isotopologue"? (i.e., we are adding CH3D, and I would like to tag this as a specified isotopologue.

dkubisti commented 3 years ago

Species like SF6, NF3 are currently not tagged as halogenated species. What about adding the tag: halogen containing compounds and remove the tag halocarbon? Halocarbons will then be found as a combination of tag Halogen species + organic

rhornbrook commented 3 years ago

Species like SF6, NF3 are currently not tagged as halogenated species. What about adding the tag: halogen containing compounds and remove the tag halocarbon? Halocarbons will then be found as a combination of tag Halogen species + organic

I agree - the tag "halogen species" is more generic than halocarbon, and should be adopted instead to make allowance for inorganic halogen species.

fstuerzl commented 3 years ago

Discussion completed