linked-statistics / COOS

Core ontology for official statistics
Creative Commons Attribution 4.0 International
5 stars 5 forks source link

Model links between GSBPM and GSIM #19

Open FranckCo opened 4 years ago

FranckCo commented 4 years ago

The Unece "Supporting Standards" group lists kinks between GSBPM sub-processes and GSIM objects. These links could be rendered via a RDF object property. A GSIMObject class could also be created to scope the domain or range of this property.

FranckCo commented 3 years ago

Several points to discuss here:

JALinnerud commented 3 years ago

Could Class and Object be ModelElements? GSIMModules where the Modules are Base, Concept, Exchange, Structure, Business. I agree to using provo and any other international ontologies we can. GSIM and GSBPM could be linked by input, output, resources and control. Ref IDEF0 Will we need OWL2 object properties and datatype properties?

FranckCo commented 3 years ago

Decided during 23/02/21 meeting: create GSIMClass, with equivalent class GSIMObject and daughter GSIMEntity. Implemented in Turtle by commits https://github.com/linked-statistics/COOS/commit/2a811f05d3704f9e9ec9100fbcbde96a9b32340f, https://github.com/linked-statistics/COOS/commit/74b0ae1a6b01182c04d11268b271223f73e349e3 and https://github.com/linked-statistics/COOS/commit/f1515ddc0bb0ff6031fe4c2767119d4d0152e5e9

For GAMSO and GSBPM we did not keep the name of the model in the name of the classes (e.g. ActivityArea and not GAMSOActivityArea, Phase and not GSBPMPhase), so why should we for GSIM? But if we drop the GSIM part, we are left with "Class" or "Object", not very informative. So, since GSIM is a statistical information model, why not "StatisticalInformationClass", or "StatisticalInformationObject"?

FlavioRizzolo commented 3 years ago

I think StatisticalInformationObject/Class works. The question of whether we need the term "statistical" in the name has been raised by Edgardo:

I agree with the proposal of removing “GSIM” prefix and substitute it by a generic one. However, my hesitance is whether it is necessary to qualify them by “Statistical”. Will there be Objects or Classes other than “statistical” in the scope? If not, and for the sake of shortening the names, I would just say “InformationObject” and ”InformationClass”.

Given we included GAMSO, we might want to support information entities that are not statistical. Having said that, GSIM is a statistical model, same as GSBPM, so its scope is the statistical information entities.

I suggest to have InformationClass at the top and StatisticalInformationClass/Object as subclasses for the GSIM objects.

pafrance commented 3 years ago

We are fine with a superclass called "informationClass" as source node for GSIM subclasses. This can be linked as a stub to the corresponding GSBPM Objects. After the link has been extablished we can derive other GSIM classes by further specification, so it is fine if the class looks seldom "informative" at this level. With InformationObject being an instance of the class instead. What do you think? Paolo & Adele

FranckCo commented 3 years ago

@FlavioRizzolo DDI-CDI uses InformationObjet, I think, which is consumed and produced by activities

cdi

So, following your suggestion, I would go for prov:Entity -> coos:InformationObject -> coos:StatisticalInformationObject

FranckCo commented 3 years ago

@FlavioRizzolo

We discussed different possibilities previously for representing links between GSIM objects and GSBPM sub-processes, in particular the idea of representing individuals sub-processes as some kind of "abstract" or "prototype" individuals with class-like features so they can be used as domains or ranges of properties. We mentioned OWL NamedIndividual as a possibility, but I checked in the specifications and I understand that OWL named individuals are just individuals that have an IRI (https://www.w3.org/TR/owl2-syntax/#Individuals).

Actually, considering how GSBPM sub-processes are currently declared in COOS, typing them as owl:NamedIndividual might actually restrict the possibilities. Just typing them as coos:SubProcess as it is currently done does not entail that they are individuals (https://stackoverflow.com/questions/37157883/member-of-an-owlclass-versus-owlnamedindividual), if I understand correctly. You could still add axioms treating, for example, http://id.unece.org/activities/subProcess/7.3 as a class, using metamodeling. However, I'm not sure we want to go that way.

FlavioRizzolo commented 3 years ago

Example of InformationObjects being inputs and outputs of Activities:

Consider "Design Frame and Sample". Inputs are "DataSet", "DataStructure", "Variable", and "Population", and outputs are "Process Method" and "Rules", among others. Those are GSIM objects, which seem to be individuals of either coos:InformationObject, or coos:StatisticalInformationObject, to be more precise. They seem to me to be at the same level of "abstract" individuals as "Design Frame and Sample" is an "abstract" individual of the subProcess class. That aligns with prov:used/prov:wasGeneratedBy as well, and their inverses.

FlavioRizzolo commented 3 years ago

To discuss, if possible:

image

The current classes are in white and the suggested additions in green. Note also in grey the renaming of Information "Object" and Statistical Information "Object": object is kind of controversial, and we are removing the Information Object class from DDI CDI, so I thought the term entity might be a better one, specially since it aligns with Prov.

FlavioRizzolo commented 3 years ago

Also, a question: currently, InformationObject (or Entity) is defined as "Mother of all classes defined in GSIM" and StatisticalInformationObject (or Entity) as "Information object representing statistical information". Should both be the definition of StatisticalInformationObject (or Entity)? To me GSIM doesn't apply to other type of information, e.g. HR, Finance, Procurement, etc.

InKyungChoi commented 3 years ago

Also, a question: currently, InformationObject (or Entity) is defined as "Mother of all classes defined in GSIM" and StatisticalInformationObject (or Entity) as "Information object representing statistical information". Should both be the definition of StatisticalInformationObject (or Entity)? To me GSIM doesn't apply to other type of information, e.g. HR, Finance, Procurement, etc.

I know GSIM is "Statistical Information Model"..... but are all GSIM objects "statistical information"? I am thinking of something like "Process Step", "Process Control Design" or "Identifiable Artefact", they might be concepts needed for the statistical production process, but don't seem so "statistical" information as "Population" or "Variable"?

JALinnerud commented 3 years ago

The introduction of Statistical Concept might be confusing for GSIM users that already have Concept (https://statswiki.unece.org/display/clickablegsim/Concept) with subtypes Population, Universe, Unit Type, Variable and Category.

JALinnerud commented 3 years ago

Regarding the adjective 'Statistical'. When NSIs contribute data sets to national catalogues/portals and European portals then users know the content is statistical through dct: publisher, dct: creator, foaf: Agent, dcat: theme, dct:subject, skos:Concept etc etc. I am still worried that by using the adjective Statistical everywhere we might be reducing out interoperability with other international and national groups and organisations eg national mapping agencies. The end users simply want to find data sets and combine them with other data sets. Are our data sets different from other data sets or is it 'just' that our organisations try to adhere to certain quality criteria? Are we giving ourselves more work by introducing the adjective Statistical and at the same time reducing the quality for our users? How open is our data when we use terms that are not used by other organisations?

FlavioRizzolo commented 3 years ago

I am still worried that by using the adjective Statistical everywhere we might be reducing out interoperability with other international and national groups and organisations eg national mapping agencies. The end users simply want to find data sets and combine them with other data sets. Are our data sets different from other data sets or is it 'just' that our organisations try to adhere to certain quality criteria? Are we giving ourselves more work by introducing the adjective Statistical and at the same time reducing the quality for our users? How open is our data when we use terms that are not used by other organisations?

This is a good point. Other than statistical products, the rest, i.e. data point, data structure, dataset and all its sub-classes, doesn't really need to be "statistical".

FlavioRizzolo commented 3 years ago

Also, a question: currently, InformationObject (or Entity) is defined as "Mother of all classes defined in GSIM" and StatisticalInformationObject (or Entity) as "Information object representing statistical information". Should both be the definition of StatisticalInformationObject (or Entity)? To me GSIM doesn't apply to other type of information, e.g. HR, Finance, Procurement, etc.

I know GSIM is "Statistical Information Model"..... but are all GSIM objects "statistical information"? I am thinking of something like "Process Step", "Process Control Design" or "Identifiable Artefact", they might be concepts needed for the statistical production process, but don't seem so "statistical" information as "Population" or "Variable"?

I feel the same way. It seems to me that objects in the Concepts and Structures groups are Information Entities whereas objects in the Business group are rather "Business" Entities. Not sure what to think of Exchange though.

FlavioRizzolo commented 3 years ago

Based on the recent comments, I have two questions to discuss:

Question 1: Should we add Business Entity as a child of prov:Entity to capture at least the GSIM Business group?

Question 2: Should we remove Statistical Information Entity entirely?

FlavioRizzolo commented 3 years ago

Regarding the relationship between Dataset, Data Structure and Data Point, both GSIM and DDI CDI have the same relationships as the proposed diagram above.

From GSIM:

image

From DDI CDI:

image

FranckCo commented 2 years ago

Wait for conclusion of the discussion in the GSIM revision Task Team

tfrancart commented 1 year ago

@FranckCo @flo7894 Please find an analysis of this issue in https://github.com/linked-statistics/COOS/wiki/Issue-19-analysis-note