jamesaoverton / qqv

Proposal for qualitative and quantitative values in OBO
https://jamesaoverton.github.io/qqv/
0 stars 0 forks source link

characteristics vs attributes vs values #1

Open wdduncan opened 1 year ago

wdduncan commented 1 year ago

Hi @jamesaoverton. I think it is great that you put this together. So, please understand that the following comments/recommendations are in the spirit of improving the proposal, and that I am not trying to tear down the work you've done.

Comments

Attributes.
Your definition of an attribute as "fundamental to what an entity" seems too strong. I can envision future debates about what counts as "fundamental", and it doesn't seem (to me at least) to be necessary. In COB (I'm not sure about PATO:quality) characteristics can be characteristics of any owl:Thing, including other characteristics. So, a more direct way to define more specific characteristics would be to re-use the has characteristic relation. E.g, using your penguin sex example:

:penguin1 :has_characteristic :genotypic_sex1 .
:genotypic_sex1 :has_characteristic :male_genotypic_sex1 .

This allows for cases (perhaps rare) in which the chain of characteristics may be more than 2 level deep. E.g.:

:entity1 :has_characteristic :color1 .
:color1 :has_characteristic :purple1 .
:purple1 :has_characteristic :bright_purple1 .

Values. The range of has value in your examples is a characteristic. I find this confusing. The word 'value' in my mind (and perhaps others) connotes something to do with information/data.
The OBO community is definitely in need of a standardized way to relate entities to literal values. I think better way to go about this is define specific classes of values (e.g., quantity values, geolocation values, image values, etc.) as subtypes of information. E.g.:

:value1 a :quantity_value; :has_quantity "65"^^xsd:integer; :has_unit "inches" .
:value2 a :quantity_value; :has_quantity "70"^^xsd:integer; :has_unit "inches" .

:person1 :has_characteristic :height1 .
:height1 :has_characteristic :height_at_t1;  :has_value :value1 .
:height1 :has_characteristic :height_at_t2; :has_value :value2 .

In the NMDC, we used an approach like this to represent AttributeValues. Although for purposes of the OBO Foundry, "characteristic values" would be a better label.

(tagging related COB issue)

jamesaoverton commented 1 year ago

I was not trying to make a grand philosophical claim with the word "fundamental". All penguins have a mass because they are material entities. All penguins have a genotypic sex because they are birds (or maybe eukaryotes, I'm not a biologist). We just look up the class hierarchy for the differentia at each level.

What is the domain of 'has characteristic' here? It seems like a big change to include 'characteristic' in the domain of 'has characteristic'.

I agree that the colour example shows that more than one level of determination is possible and worth capturing.

I presented several version of this work that included something like 'quantity values'. Some of the problems were:

  1. Ontologically, what is a value? If it's an ICE, then it's about something, which is not at all clear to me in every case. If it's any kind of GDC, what are the individuation conditions?
  2. If it's a fact that the penguin N1A1 weighs 3750g, then that's a fact whether or not there is any information about it. Facts should not require a detour into "information land".
  3. Symmetry between qualitative and quantitative modelling seems very important to me. If quantitative values are information, what are categorical values such as 'normal mass' or 'purple'?

When time permits, I intend to write a "Roads not travelled" document, with alternatives that we have explored and decided against. This work has involved years of drafting documents, presenting them, digesting feedback, and trying to find something simple and comprehensive. It takes a long time.

wdduncan commented 1 year ago

I was not trying to make a grand philosophical claim with the word "fundamental". All penguins have a mass because they are material entities.

This is may be be easy for things at the top of the hierarchy (e.g., material entity), but may get convoluted/difficult/contentious as you go deeper in the hierarchy. Plus, you may want to assign values to characteristics that aren't necessarily identified as being "fundamental".

What is the domain of 'has characteristic' here?

In COB, the domain of has characteristic is owl:Thing. So, has characteristic can be used to relate characteristics to characteristics. Are you using a different version of has characteristic?

Ontologically, what is a value?

I used the term information instead of ICE. Are they equivalent in COB, or in the scenarios you present? Also, I'm not sure too much ontological energy needs to be invested here. GDCs are already strange things (i.e., individuals that be located in many other individuals), but the are in the Foundry. Values seem less controversial to me.

If it's a fact that the penguin N1A1 weighs 3750g, then that's a fact whether or not there is any information about it.

Yes. I agree the weight of N1A1 is a fact. However, the literals/data used to represent N1A1's weight seem best in information land, though this may not be a perfect fit. After all, "3750" could be recorded using Roman numerals, encoded in binary, etc. Moreover, weight is not a static thing. N1A1 is heavier on Jupiter, and lighter on the Moon.

The literal values can also be related directly w/o an intervening "value type". E.g.:

:mass1 :quanity_value "3750"; :unit "kg .

The "value types" just provide an more organized way to assign literal values, which is useful for data quality checks.

what are categorical values such as 'normal mass' or 'purple'?

These are characteristics. E.g.:

:N1AI : has_characteristic :mass1 .
:mass1 : has_characteristic :normal_mass1

I find calling them "values" to be confusing, and it adds an extra level of complexity. If literal categorical are needed, then some like enums can be used.

This work has involved years of drafting documents, presenting them, digesting feedback, and trying to find something simple and comprehensive.

I was only aware of one early google doc about this, which I also commented on. I realized I stepped away for a bit, but I don't recall seeing any more communication about that document for quite some time. Maybe you moved the communication to another channel?

When time permits, I intend to write a "Roads not travelled" document

Yes. That would be helpful. However, I think it would be good for a number of others who may not operate in your immediate circle to weigh in before this becomes a defacto standard for the OBO community.

ramonawalls commented 1 year ago
:entity1 :has_characteristic :color1 .
:color1 :has_characteristic :purple1 .
:purple1 :has_characteristic :bright_purple1 .

I find the above very odd. Shouldn't these classes be in a subclass hierarchy? That is how PATO has done it for years. I do agree about the need to chain characteristics several levels deep, but I think it could be handled with a property chain.

Likewise, I find it strange that

In COB, the domain of has characteristic is owl:Thing. So, has characteristic can be used to relate characteristics to characteristics.

although I do see where it could be a useful way to relate modifiers to characteristic (e.g., blood pressure has_characteristic elevated, but is blood pressure even a characteristic? I think it could be.).

wdduncan commented 1 year ago

Hi @ramonawalls great to hear from you!

Shouldn't these classes be in a subclass hierarchy?

Yes, you could do it that way (an in many other cases too). The example was ad-hoc ... Does this make for a better example?:

:entity1 :has_characteristic :color1 . .
:color1 :has_characteristic :purple1 .
:purple1 :has_characteristic :bright1 . # bright can apply to other colors

think it could be handled with a property chain.

Not sure I follow ... what would be property chain be? Do you mean something like?: :has_characteristic o :has_characteristic.

but is blood pressure even a characteristic?

Suppose blood pressure is represented as a process. In COB, processes can have characteristics.

:blood_pressure1 a :process .
:elevated1 a :characteristic .
:blood_pressure1 :has_characteristic :elevated1 .

I find it strange that ...

The notion of "characteristics of characteristics of characteristics ..." (i.e., second-order characteristics) strikes many as strange. Barry used to be against this. I don't know what he thinks now. In the current proposal, it should be noted that attributes and values are characteristics. So, it implements second-order characteristics.

I'm only proposing that instead of creating special labels for these second-order characteristics (i.e., attribute, value), we just call them "characteristics". I think "values" live best in information land (though it may not be an ideal fit).

ddooley commented 1 year ago

Pardon for combing through this one more time, but there are parts that will take more finessing to explain to users, and even I, looking at this for years, and mulling these last few days, stumble!

About the distinction between characteristic and attribute:

Do we need to make this distinction here, or can it be established elsewhere, separately, to the same effect, but one step removed from qualitative and quantitative value handling?

For purposes of argument only, and recalling some ontology concepts, presumably "substantial kind" characteristics are defined by axioms such as "is-a x if ‘has characteristic’ y”, meaning for sub-classes and instances, a characteristic necessarily exists for that entity by definition of its (or its ancestor’s) class. Here James you use ‘has attribute’. As you say an attribute is a subclass of characteristic, so:

Meanwhile, a "phasal kind" enables an instance of a substantial kind (e.g. human) to also be an instance of a phasal kind (human with hair) that in theory doesn't interfere with the identity of the substantial kind. Thus Phasal characteristics are axiomatized to exist only during some parts of a substantial kind entity’s history. The phasal kind likely includes an additional package of characteristic(s) (e.g. hair colour).

But instead I suggest for our qualitative and quantitative work we simply use a single “has characteristic” relation regardless of whether the range involves a phasal or substantial characteristic. We postpone adding “has [substantial / phasal] characteristic” or equivalent, since beyond setting the range, it doesn’t seem to enable us to infer anything more about the content of a characteristic description. I think a reasoner could infer and overlay those relations later in a knowledge graph, to show a use of ‘has characteristic’ can be inferred to be the more particular ‘has attribute’ if desired, using whatever logic we ourselves fashion to know when to invoke “has attribute” in the first place (namely, when a characteristic of a substantial kind is being referenced.)

This simplifies the readability of "Values Without Attributes" section and other places where there is a mixture of attributes, characteristics, and "has characteristic", and "has attribute".

ddooley commented 1 year ago

I think this diagram provides a cheat-sheet of most of James' comprehensive vision (but it replaces "attribute" with "characteristic"):

image

Mulling this over, I'd like to try considering a "value" as IAO information, treating it as a "characteristic specification", which is already hinted at IAO with the narrower IAO "is quality specification of"[edited]. Maybe a generic "is specification of" would do for label, or maybe "is value of" could still work. [edited to show "characteristic value", Bill's suggestion]

image

Now I recall that distinguishing a statement of fact from information was a desire, but I think the "detour into information land" - a simple local semantic that aboutness of a characteristic specification is the characteristic itself, at a minimum - will make it a short and acceptable trip?

Hopefully a characteristic specification still allows flexibility of "ascertained by" and all the possible provenance statements - measured, calculated, estimated, quoted from, etc.

Either way we still need to hash out how to represent more complex multicomponent characteristics like BMI and geographic coordinate.

wdduncan commented 1 year ago

@ddooley Thanks for putting the together the diagrams!

I agree that using has characteristic/characteristic of to relate general properties (i.e., determinables) to specific instantiations of the properties (i.e., determinants) is more straightforward. For example, representing and entities height overtime could be represented like so:

:entity1 :has-characteristic :height_of_entity_1 .
:height_of_entity_1_at_time_1 :characteristic-of :height_of_entity_1 .
:height_of_entity_1_at_time_2 :characteristic-of :height_of_entity_1 .
etc.

I'm not following what you are wanting to use "characteristic specification" for. If you want to link the characteristic to literal values, can you make use of measurement datum (or something similar)? E.g.:

:measurement_datum_1 :is-about :height_of_entity_1_at_time_1 .
:measurement_datum_2 :is-about :height_of_entity_1_at_time_2 .
# add axioms with literal values related to :measurement_datum_1, :measurement_datum_2

If you don't like the is about relation, you can use owl:hasValue or some other OBI relation.

ddooley commented 1 year ago

So that "characteristic specification" can be a measurement datum (output of assay) as well (as noted in diagram far right), but not necessarily. There was the desire to have these ICE's possibly be estimates, etc.

As for "is characteristic specification of", indeed it is directly related to "is about" object property:

The "is characteristic specification of" is a bit different from current "is about" subproperties "is quality specification of", and "is quality measurement of", in range and definitions that are specific to measurement processes and material transformation.

ddooley commented 1 year ago

But to reiterate - I'm not wedded to 'is characteristic specification of' etc. "Is about" might be too general, for example, insofar as a measurement datum can be about a characteristic, but also the material its about, so how to distinguish the two?

wdduncan commented 1 year ago

I wasn't aware that measurement datum had to be the output of an assay. But, I suppose if you take 'assay' to be synonymous with 'evaluation', then it makes sense. In any case, you could use the more general data item.

Just make sure I understand you: The motivation for is characteristic specification of is to have more specific kind of is about relation that restricts the range to be a characteristic (i.e., data item is characteristic specification of characteristic). Right?

ddooley commented 1 year ago

Yeah, its not axiomatized but "measurement datum" defn: "an information content entity that is a recording of the output of a measurement such as produced by a device". It does have axiom requiring it to be about some material entity.

Right, range narrowed to characteristic. I just wanted to convey by "is characteristic specification of" something closer to "is value of" (inv. "has value"). Open to some other label. "is specification of characteristic"? "is specification of", "specifies". "is representation of" or "represents" / "has representation" could be promising (have to rename a certain OBI data property though if using that!)

ddooley commented 1 year ago

One other thing, one could easily enhance a characteristic specification directly with time that measurement was taken, or time interval it is truthful of (though this info originates from measuring process), and other contextual qualifications. The alternative is to have a phasal identifier that ties an instance of an entity (which the characteristic is about) to some place and interval of time.

wdduncan commented 1 year ago

@ddooley For relating things in "information land" (e.g., ICEs) to characteristics, would the labels 'has characteristic value' and 'is characteristic value' suffice?

ddooley commented 1 year ago

@wdduncan yes that could work (maybe 'is characteristic value of'). Avoids semantic overload of "has value / is value" if that is a concern to people.

James, are we bastardizing your creation!? 😮

ddooley commented 10 months ago

Various people have been using "measure of" and "has measure" in conjunction with characteristics and values. I like that, so have revised diagram:

image
matentzn commented 9 months ago

@ddooley Very nice image. Does this triangle

image

Imply the following role chain:

'has specified output' o 'is measure of' subPropertyOf 'assay measures characteristic'

This would answer my question over in slack!

ddooley commented 9 months ago

Yes! Though "is measure of" / "measures" is a tentative proposal! "Assay measures characteristic" is in production.

matentzn commented 9 months ago

Very nice, looking forward to seeing this implemented!

ddooley commented 5 months ago

So some further feedback needed. There's a bit of awkwardness about the fact that because OBI assay only outputs "data item", that for assays that seem to output categoricals (think of an assay generating a size rank "large", "average", "small") we would have to point to an instance of a characteristic indirectly, as I think James foresaw.

image

To solve this, can we allow "has measure" (or James' "has value") point to either an instance of a characteristic or a data item?

Also, can we say that a characteristic, being a specifically dependent continuant, pertains to some segment of a time:ProperInterval of the continuant? What relation can I attach between the characteristic and that segment? I've used "has characteristic" above; I can't apply "existence starts and ends during" since that only applies to material entities. The characteristic's interval isn't necessarily the entire continuant's existence, but does have a time:intervalIn relation to that entity's existent interval. This way we can more easily indicate start and end times of some kind of (organism) life stage characteristic - without having to include reference to an organism or its lifespan at the same time.