linked-statistics / xkos

A SKOS extension for statistical classifications
35 stars 8 forks source link

Introduce a LevelType class ? #4

Open FranckCo opened 12 years ago

FranckCo commented 12 years ago

The UML model introduces a LevelType class to which the depth and a description are attached. Did we really decide that ?

FranckCo commented 12 years ago

Dan's comment on this : "I don’t recall this either. However, it isn’t doing any harm. It houses a set of attributes associated with the level, apart from the level. There is also the possibility this can be reused, and that might be important when comparing or describing different versions of the same classification, e.g., NAICS 2002 and NAICS 2007. Their basic structures and associated meanings on levels are the same."

FranckCo commented 12 years ago

I think on the contrary that it complexifies the model and the data queries for a very low potential (and interest) of reuse.

arofan commented 12 years ago

I am with Dan on this one - you can always ignore it, but for some classififcations this will be useful.

FranckCo commented 12 years ago

Decision during the February 2 teleconference : use case must be clarified. We keep an issue opened but postpone it to version 2. In the meantime, LevelType is not introduced for now.

FlavioRizzolo commented 5 years ago

I'm adding here a use case in support of introducing a LevelType class to decouple hierarchical relationships from the actual objects (levels and items).

For instance, some classification variants can have additional level or skip levels from the standard classification the are based on. This occurs not only with disseminated variants but also with dozens of internal variants created for a variety of analyses. For instance, a Standard Geography Classification in Canada has Geographical regions of Canada, Provinces and territories, Census divisions and Census subdivisions as levels (Figure 2-a).

image

Some variants skip Geographical regions of Canada altogether starting the hierarchy at the Provinces and territories level (Figure 2-b). Others have additional levels, e.g. Economic regions (Figure 2-c) or Census agricultural regions and Census consolidated subdivisions (Figure 2-d). The common levels (and their items) are exactly the same in all cases, the only thing that changes is the parent-child relationships between them. From a classification management point of view, it makes sense to reuse the common levels in Figure 2-a as much as possible. That is only possible when the parent-child relationships, both between levels and items, are separate entities.

Skipping a level is also common in economic statistics when doing multidimensional analysis or aggregations (e.g. System of National Accounts). Dimensions are oftentimes created from classifications, e.g. the North American Industry Classification System (NAICS), the North American Product Classification System (NAPCS) and others. However, it’s common that different levels are skipped and others added depending on the needs of the analysis at hand.

Another type of variant regroup items in different ways. For instance, some NAICS 2012 variants, e.g. Information and communication technology sector, Energy sector, and Content and media sector, add a top level that partitions items into two categories: those that belong to the sector in the name and those that don’t. Some new items are created to deal with that partition at different levels, but most of them are unchanged w.r.t. the standard classification they are based on, i.e. NAICS 2012 in this case. Reusing all those unchanged items across all relevant variants is only possible when parent-child relationship between levels and items are separate entities.