sacmwg / draft-ietf-sacm-terminology

SACM terminology aligned with best practice definitions, standard references, and terminology definitions of other work groups
Other
2 stars 2 forks source link

Propose to remove the term metadata from the draft. #66

Open adammontville opened 6 years ago

adammontville commented 6 years ago

At its heart, metadata is simply "data about data", and there really isn't much more to say. There is a bunch of expositional text associated with the term in the draft:

In the SACM information model, data is referred to as Content. Metadata about the content is referred to as Content-Metadata, respectively. Content and Content-Metadata are combined into Subjects called Content-Elements in the SACM information model. Some information elements defined by the SACM information model can be part of the Content or the Content-Metadata. Therefore, if an information element is considered data or data about data depends on which kind of Subject it is associated with. The SACM information model also defines metadata about the data origin via the Subject Statement-Metadata. Typical examples of metadata are time stamps, data origin or data source.

This information isn't really helpful to the definition, and because metadata is a very common term and widely understood as data about data, I propose we remove the term from this draft.

henkbirkholz commented 6 years ago

I would like to rephrase it into a question and a meta-question:

Is the definition helpful to understand the term metadata and its usage of the sub-types wrt implementors or other readers of the document?

I am basically neutral. If not defined here, it has to be elaborated on where it is used most, which is in the information model, I think. I am slightly in favor of keeping both definition and context here in this doc and have a redundant paragraph about when or how a IE type is used to represent metadata and when it represents content, in the IM too.

strazzie123 commented 6 years ago

My 2 cents.

Metadata is a fundamental tool that is useful for describing and prescribing the characteristics and behavior of a set of objects (where "set" can be 1, of course). I would be loathe to remove the term for these reasons.

From Adam's original post:

In the SACM information model, data is referred to as Content. Metadata about the content is referred to as Content-Metadata, respectively. Content and Content-Metadata are combined into Subjects called Content-Elements in the SACM information model. Some information elements defined by the SACM information model can be part of the Content or the Content-Metadata. Therefore, if an information element is considered data or data about data depends on which kind of Subject it is associated with. The SACM information model also defines metadata about the data origin via the Subject Statement-Metadata. Typical examples of metadata are time stamps, data origin or data source.

IMHO, this definition needs to be reworked

This information isn't really helpful to the definition, and because metadata is a very common term and widely understood as data about data, I propose we remove the term from this draft.

While I agree that most people do think this, I believe that we should explicitly mention that metadata can be descriptive as well as prescriptive.

regards, John

On Thu, Dec 14, 2017 at 6:44 AM, Henk Birkholz notifications@github.com wrote:

I would like to rephrase it into a question and a meta-question:

Is the definition helpful to understand the term metadata and its usage of the sub-types wrt implementors or other readers of the document?

I am basically neutral. If not defined here, it has to be elaborated on where it is used most, which is in the information model, I think. I am slightly in favor of keeping both definition and context here in this doc and have a redundant paragraph about when or how a IE type is used to represent metadata and when it represents content, in the IM too.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sacmwg/draft-ietf-sacm-terminology/issues/66#issuecomment-351730427, or mute the thread https://github.com/notifications/unsubscribe-auth/AJgkSTJSEYJo-g11HuOnRZ9mt_995QBGks5tATRPgaJpZM4Q8hpR .

-- regards, John

henkbirkholz commented 6 years ago

These are probably two points worthwhile to be captured and worked into an improved definition (if we decide not to drop the term):

sacm commented 6 years ago

Sorry, I disagree with the first point. This doesn’t make sense from an object-oriented class design point-of-view, or from an ontological point-of-view.

An IE can’t have multiple types. It is either data or metadata, but not both. It would make far more sense to have separate class hierarchies to model data and metadata, and then allow data to aggregate specific metadata as needed.

Regards, John

From: sacm [mailto:sacm-bounces@ietf.org] On Behalf Of Henk Birkholz Sent: Friday, December 15, 2017 7:34 AM To: sacmwg/draft-ietf-sacm-terminology draft-ietf-sacm-terminology@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [sacm] [sacmwg/draft-ietf-sacm-terminology] Propose to remove the term metadata from the draft. (#66)

These are probably two points worthwhile to be captured and worked into an improved definition (if we decide not to drop the term):

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/sacmwg/draft-ietf-sacm-terminology/issues/66#issuecomment-352035067, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKbE0dnBlDk2yMsvYb8nt8W-zZebgOozks5tApF-gaJpZM4Q8hpR.

henkbirkholz commented 6 years ago

Hm, whike certainly possible, it would create a lot of redundancy (YANG just removed a similar type of redundancy, via NMDA, by collapsing redundant trees). An overused but viable example: the IE IPv4-Address.

It can be part of an TE identifier -> being metadata.

It can be part of SACM Content -> being data.

Would you recommend to introduce two types of IPv4-Address for this? NETCONF, for example, reverted that kind of decision.

henkbirkholz commented 6 years ago

Addendum: From an Ontology pov (I did not take that into account in my first reply), it is rather okay to tie (domain/range association) a concept via two different object properties to two different other core concepts (e.g. data & metadata) concepts. Even in the more restrictive scope of taxonimc parents(hip), there can be more than one parent concept (although this would render an a-box rather complex and might not be recommended in a production envirionment, as it requires more complex and reliable reasoners - i.e. potentially creates a lot of reasoning overhead).

strazzie123 commented 6 years ago

Hm, whike certainly possible, it would create a lot of redundancy (YANG just removed a similar type of redundancy, via NMDA, by collapsing redundant trees).

Sorry, I fail to see: 1) why we care if there is redundancy in an **information** model a) YANG is a **data** model b) the purpose of an info model is to define objects and their relationships; redundancy is a function of good the model is

An overused but viable example: the IE IPv4-Address.

It can be part of an TE identifier -> being metadata.

It can be part of SACM Content -> being data.

Nope. As I said earlier, having something being of two types is problematic at best wrt code generation. Furthermore, why would an identitifer ever be considered metadata? Isn't an identifier at least as important as data? :-) And remember, metadata is typically considered optional.

best, John

On Sat, Dec 16, 2017 at 2:29 AM, Henk Birkholz notifications@github.com wrote:

Hm, whike certainly possible, it would create a lot of redundancy (YANG just removed a similar type of redundancy, via NMDA, by collapsing redundant trees). An overused but viable example: the IE IPv4-Address.

It can be part of an TE identifier -> being metadata.

It can be part of SACM Content -> being data.

Would you recommend to introduce two types of IPv4-Address for this? NETCONF, for example, reverted that kind of decision.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

-- regards, John

strazzie123 commented 6 years ago

Disagree. You must define formal logic that says that an object can be of two different types.

regards, John

On Sat, Dec 16, 2017 at 11:14 AM, Henk Birkholz notifications@github.com wrote:

Addendum: From an Ontology pov (I did not take that into account in my first reply), it is rather okay to tie (domain/range association) a concept via two different object properties to two different other core concepts (e.g. data & metadata) concepts. Even in the more restrictive scope of taxonimc parents(hip), there can be more than one parent concept (although this would render an a-box rather complex and might not be recommended in a production envirionment, as it requires more complex and reliable reasoners

  • i.e. potentially creates a lot of reasoning overhead).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/sacmwg/draft-ietf-sacm-terminology/issues/66#issuecomment-352204884, or mute the thread https://github.com/notifications/unsubscribe-auth/AJgkSUVP2cz2So0iQ2glaPPP5FTW98xNks5tBBaagaJpZM4Q8hpR .

-- regards, John

henkbirkholz commented 6 years ago

Sorry, I fail to see: 1) why we care if there is redundancy in an information model a) YANG is a data model b) the purpose of an info model is to define objects and their relationships; redundancy is a function of good the model is

I agree, I got distracted by Ontologies :-) Touché. This is about the IM. Redundancy vs. Readability and in consequence comprehensibility.

Nope. As I said earlier, having something being of two types is problematic at best wrt code generation. Furthermore, why would an identitifer ever be considered metadata? Isn't an identifier at > least as important as data? :-) And remember, metadata is typically considered optional.

I would still argue it is the same type (e.g. in my example) used in different contexts (e.g. via ontological object property relationships), which provides it with more context, mae it data and metadata in the first place - correspondingly until now I did not understood metadata as specific subset of all types.... like unit32 and unit32-metadata.

You made me become more neutral on this point, with a slight favor of not creating redundant IE types for metadata and data, but I am starting to understand your point better.

strazzie123 commented 6 years ago

I would still argue it is the same type (e.g. in my example) used in different contexts (e.g. via ontological object property relationships), which provides it with more context, mae it data and metadata in the first place - correspondingly until now I did not understood metadata as specific subset of all types.... like unit32 and unit32-metadata.

The problem is that while this could work for ontologies, it doesn't work for info or data models.

regards,

John

On Sun, Dec 17, 2017 at 3:59 PM, Henk Birkholz notifications@github.com wrote:

Sorry, I fail to see: 1) why we care if there is redundancy in an information model a) YANG is a data model b) the purpose of an info model is to define objects and their relationships; redundancy is a function of good the model is

I agree, I got distracted by Ontologies :-) Touché. This is about the IM. Redundancy vs. Readability and in consequence comprehensibility.

Nope. As I said earlier, having something being of two types is problematic at best wrt code generation. Furthermore, why would an identitifer ever be considered metadata? Isn't an identifier at > least as important as data? :-) And remember, metadata is typically considered optional.

I would still argue it is the same type (e.g. in my example) used in different contexts (e.g. via ontological object property relationships), which provides it with more context, mae it data and metadata in the first place - correspondingly until now I did not understood metadata as specific subset of all types.... like unit32 and unit32-metadata.

You made me become more neutral on this point, with a slight favor of not creating redundant IE types for metadata and data, but I am starting to understand your point better.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/sacmwg/draft-ietf-sacm-terminology/issues/66#issuecomment-352296625, or mute the thread https://github.com/notifications/unsubscribe-auth/AJgkSZeZcCkMqFppKT31ZXLRUcPtqH-Wks5tBarTgaJpZM4Q8hpR .

-- regards, John

adammontville commented 6 years ago

Thankfully (?), we are talking about a terminology draft, which is more aligned with an ontological perspective than one tied to a specific information or data model. Back to the original question: Do we, or do we not, keep the term metadata? It is already broken out into definition followed by exposition, so we could simply leave it, but as John pointed out we might want to clean up the expositional text to something more effective than:

In the SACM information model, data is referred to as Content. Metadata about the content is referred to as Content-Metadata, respectively. Content and Content-Metadata are combined into Subjects called Content-Elements in the SACM information model. Some information elements defined by the SACM information model can be part of the Content or the Content-Metadata. Therefore, if an information element is considered data or data about data depends on which kind of Subject it is associated with. The SACM information model also defines metadata about the data origin via the Subject Statement-Metadata. Typical examples of metadata are time stamps, data origin or data source.

John or @henkbirkholz if you have proposed text for the expositional replacement, please provide it.