Using UCD in models - Githubissues

ivoa / dm-usecases

The is repo gathers all the material to be used in the DM workshop 2020

The Unlicense

1 stars 3 forks source link

Using UCD in models #15

Open lmichel opened 3 years ago

lmichel commented 3 years ago

This thread continues the topic initiated [here](https://github.com/ivoa/dm-usecases/issues/12#issuecomment-795751659]

UCD tells more that measure type. UCDs are 2 words label e.g. pos;meta.main Therefore you cannot put UCDs in measures as built-in parameters. .... ....

lmichel commented 3 years ago

A challenge we have to face, is to build a model encompassing any parameter one can found in astronomical catalogs.

The natural way to do this is to make one class for each quantity. This does not work because there are too much differents sort of parameters, and their number increases day after day.
The solution adopted by Mango is to use specific classes in a few cases (position ...) and to model the others with generic objects (meas:GenericMeasure).
The problem is now to get the physical meaning of a GenericMeasures. We need a semantic tag for doing this (e.g This is measure is a magnetic field).

My proposal is to use the Uniform Content Descriptor (UCDs) for this.

Both @mcdittmar and @msdemlei claim that UCDs cannot be used in a model (I do not agree). The arguments are that the scope of the UCDs goes beyond the quantity roles and that we can get mismatchs between UCDs and measure classes (e.g. a magnitude with ucd=pos.eq)

This issue has been tackled by the MANGO proposal which forces UCD prefixes for specific classes. So that a magnitude with ucd=pos.eq is not compliant with the model.

It has also been proposed to use a vocabulary, but building a vocabulary that would be a subset of the UCDs is a job which is not worth it, except if it just maps UCDs (or a subset of). This would be like hidding the use of UCDs.

Other suggestions?

mcdittmar commented 3 years ago

To state/recap my position more specifically.

Mango is a mixing semantic modeling and formal vo-dml modeling techniques.

Parameter.semantic: is an entry from a semantic vocabulary identifying the role of the content in the Property. This equates to an attribute name (or vo-dml role) in a formally modeled Property type.
Parameter.ucd: is using the UCD vocabulary to identify the Type of the Property.measure element (mainly for when it is GenericMeasure class). This equates to the vo-dml type of the object at Property.measure.

UCD tells more than measure type. UCDs are 2 words label e.g. pos;meta.main Therefore you cannot put UCDs in measures as built-in parameters. The issue from my perspective is that UCD conveys more than the Type, and so, is maybe not the best choice for this job. The concepts overlap, so it can get the job done, but the UCD is a tool which was designed for a different job.

"pos.eq" conveys the type as Position, but also information about the coordinate space

"pos;meta.main", the second word contains information regarding the role.. which overlaps with the purpose of Parameter.semantic

Using any semantic for this purpose is doing the same modeling work, but in a different way. ie: applying ucd="phot.flux" to a meas:GenericMeasure, indirectly defines FluxMeasure, a specialization of GenericMeasure By not doing the formal vo-dml equivalent you:

lose the benefits that come with it ( auto-class generation, defining associated metadata (PhotCal), etc )
no longer have a model which defines what a "phot.flux" is.. what are the expectations?, are there algorithm details?, was cos(dec) applied?. Is that information to be recorded in the UCD document?
still have the dependency on the Measure model version
add a dependency on the UCD vocabulary version, which would have to take on the job of being updated to add new Measure types

The other part of my objection (left to the original issue) has to do with "whose job is it to identify the Type of the Measure?"

On Fri, Mar 12, 2021 at 10:38 AM Laurent MICHEL @.***> wrote:

A challenge we have to face, is to build a model encompassing any parameter one can found in astronomical catalogs.

-

The natural way to do this is to make one class for each quantity. This does not work because there are too much differents sort of parameters, and their number increases day after day.

The solution adopted by Mango is to use specific classes in a few cases (position ...) and to model the others with generic objects ( meas:GenericMeasure).

The problem is now to get the physical meaning of a GenericMeasures. We need a semantic tag for doing this (e.g This is measure is a magnetic field).

My proposal is to use the Uniform Content Descriptor (UCDs) for this.

Both @mcdittmar https://github.com/mcdittmar and @msdemlei https://github.com/msdemlei claim that UCDs cannot be used in a model (I do not agree). The arguments are that the scope of the UCDs goes beyond the quantity roles and that we can get mismatchs between UCDs and measure classes (e.g. a magnitude with ucd=pos.eq)

This issue has been tackled by the MANGO proposal which forces UCD prefixes for specific classes. So that a magnitude with ucd=pos.eq is not compliant with the model.

It has also been proposed to use a vocabulary, but building a vocabulary that would be a subset of the UCDs is a job which is not worth it, except if it just maps UCDs (or a subset of). This would be like hidding the use of UCDs.

Other suggestions?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ivoa/dm-usecases/issues/15#issuecomment-797568285, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADMLJCUGE7GAZUGS4XPTQADTDIYOFANCNFSM4ZCNR47Q .

msdemlei commented 3 years ago

On Fri, Mar 12, 2021 at 07:38:10AM -0800, Laurent MICHEL wrote:

A challenge we have to face, is to build a model encompassing any parameter one can found in astronomical catalogs.

My take is: This is a problem that immediately goes away when you properly define "build a model".

You see, as I've argued in the past few years, there is not one model for something you find in astronomical catalogues (let's call it a "column" here, but other entities are possible, too), but each column can be annotated using several different models.

If you have a column magK, it can be:

the value in a photometry annotation (which may also define the bandpass, zero point, etc)
the value in a measurement annotation (which will link it to an err_magK and perhaps even correlations if applicable)
one of the dependent_axis in an nDCube annotation
the value in a characterisation annotation (which might give the legal and actual range of the thing, and perhaps some advanced statistics)
the result (or whatever) in a provenance annotation that links this with an image and a software that did the source extraction.

-- and so on.

Seen like this, there is no challenge -- simple, plain models automatically work for anything that matches their purpose. Which is one of the reasons I keep arguing for them.

lmichel commented 3 years ago

I understand Mark's point of view, though I do not follow it totally.

You are right if we consider that Property.ucd is just here to define measure types, UCDs do more and restricting their roles to a sort of class type is misleading So let's figure out a pattern that statisfies the need both modeling measure types and describing measure physical meaning.

1- providing types to measures

for some of them this done by the object classes (position, pm ..) not extra things to do
for the others (GenericMeasure) we need a specific tag, let's call it MeasureType

The requirements for MeasureType are

Open ended vocabulary
Exentable without model update
To be set as a meas:Measure attribute to make Measurement model self consistant. The good way to proceed would be to derivate this vocabulary from the UCD list by jusy keeping the relevant fragments (e.g. magField)

2- Describing the measure content As said we need to also to set the roles of the measures.

Actually this is not the purpose of mango:Property.semantic. Property.semantic is here to allow connecting a measure with a vocabulary term. For now I've no concrete example working with Property.semantic, so I propose to push it aside (may be to remove it).

I insist to say that the better candidate to describe the measure role is the UCD.

In this configuation, the measurement model would be able to associate type with GenericMeasure and Mango would carry the complete measure descripitions of the measure with UCDs. Making sure that UCDs are consistant with the measure objects they are associated with is the reponsability of the data annoter. This risk of consistency breaking exist each time data are tagged with semantic data, it is not specific to our case.

lmichel commented 3 years ago

Markus,

My take is: This is a problem that immediately goes away when you properly define "build a model".

I answered to your take by anticipation in this post.

To me, the model must be valid, self-consistant and usable out of the scope of VOTable parsing. This is why it cannot be designed as a set of independant objects spread over VOTable FIELDs.

We have a model for columns quantities: MCT We need a model providing a scheme to put these quantities together as well as extra information.

Bonnarel commented 3 years ago

I've read this thread and also the long #12 one. First all, I disagree that ucd mix quantity types and roles. They are quantity types. The combination feature with semi column was there to refine quantity type by using another complementary one. UCDS allow to compare coumns content from an universal physical quantify annotation ouside any datamodel context. Initially the "role" of a FIELD in VOTable was to be given by the utype attribute ;-). Let's not speak anymore about utypes. (fb : sad face ;-) ) Now the "role" should be given by a combination of vodml-roles in the annotation.
Or if not = what kind of other "role" do you have in mind ?

This said I think I follow Laurent for the remaining part. a ) we have specialized mango:parameters using per-physics Measures. Now we know that PhotometricFlux can be one of those (just we have to add it to Measure) b ) IN case we find some odd measurements just use a Parameter with Generic Measure and we have the ucd attribute in mango:Parameter to tell us what it is. c ) we make the two subcases consistent by SUBSETTING the Parameter.ucd to some specific value consistent with the Parameter and Measure Type when is a per-physics one. d) this may be a ucd coarse grained compared to the one used in VOTable on each FIELD (but they should be consistent) e ) the ucd attribute in the Parameter class is strongly useful for non VOTAble serializations f ) we have such ucd attributes in classes not only in Provenance but also in ObsCore. The o_ucd is typically there to tell us the quantity type used as an Observable in the dataset

msdemlei commented 3 years ago

On Mon, May 03, 2021 at 10:47:22AM -0700, Bonnarel wrote:

First all, I disagree that ucd mix quantity types and roles. They are quantity types. The combination feature with semi column was

Good -- and we should keep it that way. We have sinned a bit with meta.main, but that shouldn't be a reason to re-invent UCDs in model hierarchies.

b ) IN case we find some odd measurements just use a Parameter with Generic Measure and we have the ucd attribute in mango:Parameter to tell us what it is.

Well, actually we already have @.*** -- let's see how far we get with that.

d) this may be a ucd coarse grained compared to the one used in VOTable on each FIELD (but they should be consistent)

Um... why would someone use different UCDs in DM annotation and on the field? Would would clients be supposed to do in such a case?

e ) the ucd attribute in the Parameter class is strongly useful for non VOTAble serializations

Perhaps, but then it's easy to define UCD attachment per such serialisation. There's no need to encumber our DMs with that.

But maintaining UCDs in DMs is really a side show. The main issue with current Meas is the large number of subclasses for which I still haven't seen a credible use case (over just using @.***).

And that's a large problem, because they're actually heavily damaging a fundamental and attractive baseline use case (cf. issue #36): Let a client figure out errors.

That one is rather dear to me because (learn from STC-1) this can greatly help uptake: Data providers will love it when TOPCAT automatically plots error bars on their data sets. If we make that cheap and easy, chances that people will actually add annotation to what they produce get a lot better.

With just one Measurement class (or perhaps a few when we add proper distributions), with the API of https://github.com/msdemlei/astropy, all TOPCAT would need to do is:

ann = col.get_annotations("meas:Measurement")
if ann:
  associated_error = ann.naive_error

(or it would try a few attributes it knows how to plot).

With current Meas, it will, as far as I can see, have to do something like

MEAS_CLASSES = ["meas:GenericMeasure", "meas:Time",
    "meas:Position", "meas:Velocity", "meas:ProperMotion"]
# I'm leaving out Polarisation because it really doesn't belong here 

for class_name in MEAS_CLASSES:
  ann = col.get_annotations(class_name)
  if ann:
    associated_error = ann.naive_error

And, worse, each time we invent a new Measure subclass, it will have to amend MEAS_CLASS.

That's a high price to pay; it would be worth paying if we got a major benefit from it. But I can't even see a minor one.

Of course, you could tell Mark to

(a) retrieve the VO-DML (b) parse it (c) derive the MEAS_CLASSES from the class hierarchy.

(or, equivalently, have the VOTable library do that, and have a new get_annotation_subclass API function).

That's an even higher price, not only because of all the extra code for VO-DML processing (lesson from STC-1: make takeup easy and cheap) but also because you start having client code retrieve stuff from ivoa.net in normal operations (lesson from Registry: That's a pain; don't get me started on the fun I keep having with validators having to pull schema documents from our schema repo). Again: there are cases when that may be a price worth paying. But here, I can't see a proportional benefit.

Bonnarel commented 3 years ago

Hi Markus Le 04/05/2021 à 10:45, msdemlei a écrit :

On Mon, May 03, 2021 at 10:47:22AM -0700, Bonnarel wrote:

First all, I disagree that ucd mix quantity types and roles. They are quantity types. The combination feature with semi column was

Good -- and we should keep it that way. We have sinned a bit with meta.main, but that shouldn't be a reason to re-invent UCDs in model hierarchies. OK

b ) IN case we find some odd measurements just use a Parameter with Generic Measure and we have the ucd attribute in mango:Parameter to tell us what it is.

Well, actually we already have @.*** -- let's see how far we get with that. Sorry I don't understand what you are talking about. Can you be more explicit ?

d) this may be a ucd coarse grained compared to the one used in VOTable on each FIELD (but they should be consistent)

Um... why would someone use different UCDs in DM annotation and on the field? Would would clients be supposed to do in such a case?

OK. My point was unclear, I admit it.

The ucd in Mango:parameter is not associated to a single FIELD or PARAM but to a group of them. (a group english word, not always a VOTable GROUP)

as is the Measure.

The FIELDS embedded in that mango:parameter have their own UCD which can be more accurate.

But they have to be consistent of course.

e ) the ucd attribute in the Parameter class is strongly useful for non VOTAble serializations

Perhaps, but then it's easy to define UCD attachment per such serialisation. There's no need to encumber our DMs with that.

But maintaining UCDs in DMs is really a side show. The main issue with current Meas is the large number of subclasses for which I still haven't seen a credible use case (over just using @.***). See above my question

And that's a large problem, because they're actually heavily damaging a fundamental and attractive baseline use case (cf. issue #36): Let a client figure out errors.

That one is rather dear to me because (learn from STC-1) this can greatly help uptake: Data providers will love it when TOPCAT automatically plots error bars on their data sets. If we make that cheap and easy, chances that people will actually add annotation to what they produce get a lot better.

With just one Measurement class (or perhaps a few when we add proper distributions), with the API of https://github.com/msdemlei/astropy, all TOPCAT would need to do is:
ann = col.get_annotations("meas:Measurement")
if ann:
associated_error = ann.naive_error
(or it would try a few attributes it knows how to plot).

With current Meas, it will, as far as I can see, have to do something like
MEAS_CLASSES = ["meas:GenericMeasure", "meas:Time",
"meas:Position", "meas:Velocity", "meas:ProperMotion"]
# I'm leaving out Polarisation because it really doesn't belong here

for class_name in MEAS_CLASSES:
ann = col.get_annotations(class_name)
if ann:
associated_error = ann.naive_error
And, worse, each time we invent a new Measure subclass, it will have to amend MEAS_CLASS.

That's a high price to pay; it would be worth paying if we got a major benefit from it. But I can't even see a minor one.

But the code for getting back the error should be the same for all the subclasses, no ?

Apart from polarization there are abosultly not constraint on Error for any of the subclasses (which is perfectly understandable)

The code for getting the error from any of the subclasses is typically the code for the GEneric measure isn't it ?

So why is it so large a problem?

Of course, you could tell Mark to

(a) retrieve the VO-DML (b) parse it (c) derive the MEAS_CLASSES from the class hierarchy.

(or, equivalently, have the VOTable library do that, and have a new get_annotation_subclass API function).

That's an even higher price, not only because of all the extra code for VO-DML processing (lesson from STC-1: make takeup easy and cheap) but also because you start having client code retrieve stuff from ivoa.net in normal operations (lesson from Registry: That's a pain; don't get me started on the fun I keep having with validators having to pull schema documents from our schema repo). Again: there are cases when that may be a price worth paying. But here, I can't see a proportional benefit.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ivoa/dm-usecases/issues/15#issuecomment-831780226, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMP5LTBYSC5BVFUBXZA2JYLTL6X2NANCNFSM4ZCNR47Q.

lmichel commented 3 years ago

MANGO speaking, I admit that the current draft has too many per-domain classes. (see MANGO issues) My point of view:

Use per-domain class when it is necessary.
- domain comes with specific frame classes (e.g. Time)
- Specific set of coordinates (e.g. 2 coordinates for one position)
- This allows validators to check that positions are bound with SpaceFrame and not with TimeFrame
Use generic measure anywhere else
- no frame
- one single coordinate

Having said that we need a way to give a role to those generic measures. I reaffirm that using UCDs for this is not only valid but also smart. The is why MANGO requires to have UCDs for each parameter.

In the context of a MANGO annotated VOtable, UCDs could either be set has reference to FIELD@ucd (see here) or as literals.

by reference when both column and Mango parameter UCDs match together
as literal in any other cases.
- UCD not provided in the VOTable.
- UCDs do not match (e.g pos.eq.ra and pos.eq.dec for the position fields vs pos.eq for the MANGO position Parameter)

lmichel commented 3 years ago

The current list of per-domain classes is rather limited (position, pm, veloc, time, luminosity) We can expect to be prompted to extend it in a predictable future

Planetary data
Complex shaped objects
Multi messenger data

Mango has a place holder for this. It is to be noted that adding new per-domain classes to MANGO would trigger minor revisions (nothing broken) that wouldn't break existing stuff.

mcdittmar commented 3 years ago

I can agree with most of this. My only assertion is that if UCD is considered the solution for defining what the GenericMeasure holds, that should be assigned to the GenericMeasure. Other usage of GenericMeasure (in Cube for example), will have the same question. The same GenericMeasure will not be a "phys.energy" in Source but a "stat.snr" in Cube.

On Wed, May 5, 2021 at 4:44 AM Laurent MICHEL @.***> wrote:

MANGO speaking, I admit that the current draft has too many per-domain classes. (see MANGO issues) My point of view:

Use per-domain class when it is necessary.

domain comes with specific frame classes (e.g. Time)

Specific set of coordinates (e.g. 2 coordinates for one position)

This allows validators to check that positions are bound with SpaceFrame and not with TimeFrame

Use generic measure anywhere else

no frame

one single coordinate

Having said that we need a way to give a role to those generic measures. I reaffirm that using UCDs for this is not only valid but also smart. The is why MANGO requires to have UCDs for each parameter.

In the context of a MANGO annotated VOtable, UCDs could either be set has reference to @.*** (see here https://github.com/ivoa/dm-usecases/issues/23#issuecomment-832516736) or as literals.

by reference when both column and Mango parameter UCDs match together

as literal in any other cases.

UCD not provided in the VOTable.

UCDs do not match (e.g pos.eq.ra and pos.eq.dec for the position fields vs pos.eq for the MANGO position Parameter)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ivoa/dm-usecases/issues/15#issuecomment-832520708, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADMLJCU2WRMT57ZWHFAZZW3TMEAQHANCNFSM4ZCNR47Q .

msdemlei commented 3 years ago

On Wed, May 05, 2021 at 01:36:50AM -0700, Bonnarel wrote:

Le 04/05/2021 à 10:45, msdemlei a écrit :

Well, actually we already have @.*** -- let's see how far we get with that. Sorry I don't understand what you are talking about. Can you be more explicit ?

Ah... that was "the ucd attribute of FIELD" (and PARAM) in xpath notation before github's spam obfuscator ate it.

All I was saying is: We have a perfectly good place to store UCDs. Before we create another place, let's have crystal-clear use (!) cases that that place isn't good enough any more.

d) this may be a ucd coarse grained compared to the one used in VOTable on each FIELD (but they should be consistent)

Um... why would someone use different UCDs in DM annotation and on the field? Would would clients be supposed to do in such a case?

OK. My point was unclear, I admit it.

The ucd in Mango:parameter is not associated to a single FIELD or PARAM but to a group of them. (a group english word, not always a VOTable GROUP) as is the Measure.

Yeah, that would be a clear case, except I can't see a use case for these ucds on Measure (or whatever).

On the contrary, I expect them to lead to rather confusing situations. Say you have a Photometric measurement. It's "group" UCD would presumably be something like phot.mag;em.opt.V, right?

But now it would group two columns, the value and an error. These would then have UCDs phot.mag;em.opt.V and stat.error;phot.mag;em.opt.V. Don't you agree it's a bit odd to repeat the UCD on one of the reference fields, and to have a different one on the other?

And of course, as usual: What would clients do with the UCD on Measure that they could not (possibly better) do with the UCD on the Measurement's value?

MEAS_CLASSES = ["meas:GenericMeasure", "meas:Time",
"meas:Position", "meas:Velocity", "meas:ProperMotion"]
# I'm leaving out Polarisation because it really doesn't belong here

for class_name in MEAS_CLASSES:
ann = col.get_annotations(class_name)
if ann:
associated_error = ann.naive_error
And, worse, each time we invent a new Measure subclass, it will have to amend MEAS_CLASS.

That's a high price to pay; it would be worth paying if we got a major benefit from it. But I can't even see a minor one.

But the code for getting back the error should be the same for all the subclasses, no ?

I'd hope so, yes.

The code for getting the error from any of the subclasses is typically the code for the GEneric measure isn't it ?

Again, that's my understanding, and part of the reason why I can't see why we'd want these additional classes.

So why is it so large a problem?

Because getting, managing, and parsing VO-DML is a big deal that people just wanting to find the error for a value shouldn't have to do if we can help it. And manually enumerating all the sub-classes is brittle and asking quite a bit of our adopters, in particular when at least I can't explain why these sub-classes are there to begin with.

lmichel commented 3 years ago

Having UCDs on GenericMeasure only makes sense, but would require Meas model to do it. For now UCDs are carried by MANGO and this concern all measure classes by construction.

2 elements answering Markus about the UCD repetitions :

The model must be consistent even out of the scope of a VOtable. I must be able to build a model instance, even on paper, that contain all element I need to describe my data. Hence I cannot rely to some underlying VOTable to design the model.
Not repeating UCDs in both FIELD and mapping block is a matter of syntax. (see this open post)

Bonnarel commented 3 years ago

On Wed, May 05, 2021 at 01:36:50AM -0700, Bonnarel wrote: Le 04/05/2021 à 10:45, msdemlei a écrit : > Well, actually we already have @.*** -- let's see how far we get > with that. Sorry I don't understand what you are talking about. Can you be more explicit ? Ah... that was "the ucd attribute of FIELD" (and PARAM) in xpath notation before github's spam obfuscator ate it. All I was saying is: We have a perfectly good place to store UCDs. Before we create another place, let's have crystal-clear use (!) cases that that place isn't good enough any more.

I don't think we propose to replace the ucd attribute on FIELDS/PARAM. What is in Mango is a wider usage of ucd as semantic tags. Both usages have to be consistent when appropriate

d) this may be a ucd coarse grained compared to the one used in > > VOTable on each FIELD (but they should be consistent) > > Um... why would someone use different UCDs in DM annotation and on > the field? Would would clients be supposed to do in such a case? OK. My point was unclear, I admit it. The ucd in Mango:parameter is not associated to a single FIELD or PARAM but to a group of them. (a group english word, not always a VOTable GROUP) as is the Measure. Yeah, that would be a clear case, except I can't see a use case for these ucds on Measure (or whatever). On the contrary, I expect them to lead to rather confusing situations. Say you have a Photometric measurement. It's "group" UCD would presumably be something like phot.mag;em.opt.V, right? But now it would group two columns, the value and an error. These would then have UCDs phot.mag;em.opt.V and stat.error;phot.mag;em.opt.V. Don't you agree it's a bit odd to repeat the UCD on one of the reference fields, and to have a different one on the other? And of course, as usual: What would clients do with the UCD on Measure that they could not (possibly better) do with the UCD on the Measurement's value?

The code can preprare for maniging anything "below" this parameter as a Photometric Measurement

Bonnarel commented 3 years ago

Having UCDs on GenericMeasure only makes sense, but would require Meas model to do it. For now UCDs are carried by MANGO and this concern all measure classes by construction.

Is it possible to subset the mango:parameter ucd attribute value for per-physics classes ?

2 elements answering Markus about the UCD repetitions :

The model must be consistent even out of the scope of a VOtable. I must be able to build a model instance, even on paper, that contain all element I need to describe my data. Hence I cannot rely to some underlying VOTable to design the model.

Not repeating UCDs in both FIELD and mapping block is a matter of syntax. (see this open post)

+1 : good summary Laurent