linked-statistics / xkos

A SKOS extension for statistical classifications
35 stars 8 forks source link

Special case of Combined Nomenclature/Harmonized System classifications #43

Open delcada opened 7 years ago

delcada commented 7 years ago

Some specificities linked to the CN/HS (Combined Nomenclature/Harmonized System) classifications seem not to be addressed in the specification. It should however be noted that these two classifications are central to the integrated system of international statistical classifications as they provide the "building blocks" for all product classifications (CPC, CPA, PRODCOM, SITC, etc.), i.e. the categories of these classifications are defined in terms of categories of HS or CN.

These specificities are the following:

1) some entries have no hierarchical level (see Picture 1 below) 2) some entries refer to several hierarchical levels at the same time (see Picture 2 below) 3) the explanatory notes can contain images or pictures (see Picture 3 below) 4) these classifications have intermediary levels which are not part of the hierarchy (see Picture 4 below) 5) a field called "DASHES" contains a various number of "dash" symbols ("-") which are used to indicate that texts belonging to upper hierarchical levels are omitted (see Picture 5 below) 6) the most useful labels for users are not the official labels but what is called the self-explanatory texts. CN has more than 3,000 codes labelled "Other" which are a problem for users (What is the coverage of these codes?). As the name indicates, self-explanatory texts provide textual descriptions which are more detailed and useful to users (e.g. self-explanatory text for 2805 30 80 "Other" reads: Rare-earth metals, scandium and yttrium, of a purity by weight of <95% (excl. intermixtures and interalloys). These self-explanatory texts are also useful for visualisation purposes; without these detailed textual descriptions, it would generally be impossible to understand correspondence tables and classifications defined on the basis of CN/HS categories.

To manage points 1), 2) and 4), an additional field, called "CNKEY" is used to provide the proper sequential order of the codes.

In view of the importance of these two classifications, it is essential that these specificities are addressed in the specification.

Picture 1

cn_01

Picture 2

cn_02

Picture 3

cn_03

Picture 4

cn_04

Picture 5

cn_05

tfrancart commented 6 years ago
  1. XKOS does not require that a concept be tied to a classification level. Besides, if these entries cannot be used to classify items and are just "node labels", they should be modeled as skos:Collection as described in this section of the SKOS-PRIMER.

For many description applications, for instance, "node labels" are entities of a really specific nature, and must not be used as object indices alongside "normal" concepts. Representing them as mere concepts is therefore clearly not a best practice.

  1. XKOS does not prevent that a concept be tied to several classification levels.
  2. The object of notes properties are not limited in type (as for SKOS annotation properties), and so they can be XHTML fragments and reference images.
  3. Similar to point 1, in the sense that these intermediary levels, if they cannot be used to classify items, should be modeled as skos:Collection; this is already covered by SKOS, and refined by the ISO 25964 extension of SKOS;
  4. XKOS does not prevent having dashes as or included in labels. The best option here could be to replace the dashes by the label they refer to. "Dashes" labels can be stored as alternate labels or a specific property can be created.
  5. Self-explanatory texts could be stored in skos:description values, or a specific property could be created. If one prefers to store these values as labels, SKOS-XL can be used.