ivoa-std / MANGO

Model for Annotating Generic Objects in VOTables
2 stars 8 forks source link

Parameter: content #26

Closed mcdittmar closed 1 month ago

mcdittmar commented 3 years ago

Sorry.. this is a long one.

The Source -> Parameter relation is very similar to the Cube model’s NDPoint -> Observable relation. In Cube, each Observable owns a Measure instance, and adds knowledge of whether this is ‘dependent’ or ‘independent’ data. But here, each Parameter ends up taking the place of VODML role and type from a formally modeled Source object.
ie: instead of Source.position:Position[*] we have Source with Parameter{semantic=“source position”, ucd=“pos.eq”}

I understand that this model is trying to be generic, and specifically NOT model Source explicitly, so I think the Source has a collection of Property-s is a good mechanism. But instead of this providing access to various types of Properties, it has become something that lets you build proxies for things which are not formally modeled, which I think is outside the model scope.

Parameter.semantic:

Parameter.ucd:

Parameter.measure:

Proposal:

lmichel commented 3 years ago

Sorry.. this is a long reply.

The scope of Mango is to provide a way for clients to "understand" all properties attached to one given dataset. The diversity of the datasets possibly mapped onto Mango is so huge that we cannot consider build a classical model binding sources with a predetermined set of properties with specific roles .

To work this around MANGO considers a source as an open set of properties (not speaking about associated data)

To make this working we need

The way it has been done:

If things are well done, we should be able to propose a MANGO API looking like this:

instance = Mango.get_instance(votable)
# Get all semantic blocks
available_properties = instance.get_properties()
# Get a specific measure
measure = instance.get_property("my.nice.ucd", vocabulary=None, desription=None)
error = measure.get_error()
value = measure.get_value()
coord_system = measure.get_coordsys()
frame = coord_system.get_frame()

Valid for whatever measure which is rather cool.

I took a bit of time to recap this because I would really like to avoid amending the model in a way that breaks this homongeneity

lmichel commented 3 years ago

Parameter.semantic:

You are right. As the content is totaly free, this field cannot be assumed to give a role It must be seen as a secondary qualifier, something helping some clients. I think I should make it optional and the spec must be refined as well

lmichel commented 3 years ago

Parameter.ucd:

Replaces VODML type (ie: expected Type of the ‘value’)
Has the benefit of facilitating the use of concepts with no formally modeled Measure type; “phys.magField”, “phot.mag”..
    I’ll note that I believe this is Markus’ argument for not having specialized Measure types at all, but only a single Measurement with a semantic tag to identify its nature (ala ucd).

I'm somewhere in between Markus and you: UCDs for roles but we need a model for the structuer of the measures

    In my opinion, this form may be fine for a serialization, but is VERY difficult to specify dependencies/constraints in the models
        If ucd = “pos.eq” then associated Coordinate SpaceFrame MUST have referenceFrame=“ICRS|FK4|FK5” and Spherical coordinate space
Has the vulnerability of being a consistency problem
    If ucd = “pos.eq” and the measure is “meas:Position but in GALACTIC”, the client will have to handle the inconsistency

It is true. I would say this is the cost for the flexibility. This problem arise each time the same thing has more the one identifier (UCD + dmtype here). Note that a classical model does not prevent this, it just shift the risk onto the mapping (you can map pos.eq on a galactical position)

    If ucd = “phot.mag” and the measure is “meas:GenericMeasure”, the client STILL needs to do all the work to determine if the GenericMeasure content is compatible with “phot.mag” type. If they are doing that, then they can identify it as a “phot.mag” without the prompt. NOTE: doing this MAY mean drilling down to the VOTable element, and checking the UCD on the PARAM|FIELD.. noticing that it is “phot.mag”
Having the ucd here does not solve the GenericMeasure problem, since it does not help identify dependent metadata
    If Parameter.ucd = “phot.mag” or “phot.flux” there should/must be an associated “photDM.PhotCal” instance.. how do they know that? where would they find it? This exact scenario is in the TimeSeries workshop use case.

Mango has no concept like (in)dependent metadata.

I'm not sure to follow you. Being inspired by the above code snippet you could do a checking this like this:

if generic_measure.get_coordsys()["@dmtype"] == "PhotometricSys":
    print("this measure really looks like a photometric measure"

What can we do if the curator mixes up randomly data ucds and classes? Mango is a very flexible model designed to map various data, but it relies on the thoroughness of the data provider. This can be seen as a weakness, but to me, the benefit/cost ratio is more that positive.

lmichel commented 3 years ago

Parameter.measure:

Is the parameter value, which may or may not be of the type identified in the ucd
    This can be a good thing ( qualifying GenericMeasure as “phot.flux” or “phys.magField” )
    Or a consistency problem ( ucd=“pos.eq” with measure=Time )

This can easily be checked (see previous post)

There is only 1 option here.. Parameter contains Measure
    The model text describes that there are other kinds of parameters ( flags, assigned states, classifications ). By only having a Measurement option, the model has improperly extended Measure and Coordinate for these data. That will be another ticket, but I think there is work to do here on how to handle non-measure properties.

I admit that the way I extend Measure/Coordinate might look odd, but I claim it is valid. What I'm doing with flags is not that different of what you propose for the Polarimetry.

Mango needs an interface common for all measures (including flags, assigned states, classifications ). We can imagine an intermediate layer providing that interface for different category of measure, but what woul be the gain? I admit however that the term measure is not then better choice in this case. Nothing better found right now.

lmichel commented 3 years ago

Proposal:

I would suggest splitting the Parameter into sub-classes
    Parameter: abstract parent. contains reference to associated parameter if that is needed (haven’t looked into that use case)
    PhysicalParameter: extends Parameter, contains Measure instance
    Classification: extends Parameter, contains a vocabulary literal (VocabularyTerm)
        removes need for VocabMeasure and VocabCoordinate which are not proper extensions of those models
    Flag: extends Parameter, contains what basically amounts to a user-defined enumeration value
        value = integer (OK to start, but in Chandra we have bit array flags where each bit represents a different issue )
        options = pointer to what is currently defined as FlagSys
        Removes FlagCoord, FlagSys becomes local class as part of Flag Property spec, not extension of CoordSys

This may work, but I do no see the benefit of such a complication.

None of these would have ‘semantic’ or ‘ucd’ attributes to qualify the value.

There are thousand of different UCDs, we need them. The information carried by MCT classes is not enough.

    In the PhysicalParameter we’d need to have a discussion on how to handle the complex unmodeled Measure types.
        The ‘simple’ ones, can be handled by clients interpreting units and/or the underlying VOTable element ucd.
mcdittmar commented 3 years ago

Laurent,

The scope of Mango is to provide a way for clients to "understand" all properties attached to one given dataset

Paraphrased, I'd say the goal is "to model Source and its various Properties"

To work this around MANGO considers a source as an open set of properties

Right.. so at first level you have Source has a collection of Property-s

But then, when you look at the kinds of properties, there are at least 2 inherently different catagories

  1. those based on physical entities, either measured or derived (Position, Time, Flux, Magnitude, HardnessRatio..). Whose values reside in a particular coordinate space, etc
  2. those which identify what kind of thing (Source) we have (SpectralType, CelestialClass, LuminosityClass, MorphologyClass). Whose values are are just an entry from a controlled vocabulary.

I don't think you should necessarily expect the interface to these to be the same..

MANGO API code block example

In my opinion, that thread should utterly fail for the 2nd type. It makes no sense to examine the coordSys or RefFrame or Errors of a MorphologyClass. They simply don't apply.

I admit that the way I extend Measure/Coordinate might look odd, but I claim it is valid.

I assure you that, Measurement was not intended to be extended in this way. Keep in mind that the Measurement model is designed to support the Cube case, which has a lot of parallels with this one. It too has Quality Flags and other sorts of qualifiers which are not covered by the Measurement model, because they are not within its scope.

What I'm doing with flags is not that different of what you propose for the Polarization (sp).

This is true.. I was very uncertain about including the enumerated PolarizationState in the Measurement model for this very reason.. it is technically not a measured entity, but an assigned state. I included it because

  1. we haven't really broached this subject in the DM group until now
  2. I feel it is more closely tied to a physical property than other classifications/flags
  3. I expect PolarizationFraction to become a use case in the future, and would like to allow both to be under the same Polarization type.

Given this discussion, I could be convinced to reconsider that choice.

quality flag discussion

The usage threads related to Source data often include evaluating the quality of, or usefulness of the Properties. These are externally determined and assigned to a particular Property (or Source record as a whole?). I assert that this is a more important relation than a simple 'Associated Property'. The flag has little meaning on its own, but is a qualifier on the Property to which it is assigned.

This is why I suggest the flag is not a Property itself (you aren't going to perform analysis on the Flag), and would be better modeled as an attribute on Property (maybe just PhysicalProperty) so that there is a common access point to this very important qualifier.

re: suggested restructuring of Parameter/Property and non-coordinate Measure-s "This may work, but I do no see the benefit of such a complication."

The suggested changes:

lmichel commented 3 years ago

Paraphrased, I'd say the goal is "to model Source and its various Properties"

This rather a semantic shift. I prefer to keep my "model for source data" with an acronym standing for "Model for Annotating Generic Objects"

lmichel commented 3 years ago

But then, when you look at the kinds of properties, there are at least 2 inherently different catagories

those based on physical entities, either measured or derived (Position, Time, Flux, Magnitude, HardnessRatio..). Whose values reside in a particular coordinate space, etc
those which identify what kind of thing (Source) we have (SpectralType, CelestialClass, LuminosityClass, MorphologyClass). Whose values are are just an entry from a controlled vocabulary.

I don't think you should necessarily expect the interface to these to be the same..

Since there is no way for me to sell my generalisation of Syst/Frame, I would say that is a promising approach.

I feel it is more closely tied to a physical property than other classifications/flags

Polarization is clearly a physical property expressing a coordinate system that is a state enumeration. I'm completly at ease with this.

This is why I suggest the flag is not a Property itself (you aren't going to perform analysis on the Flag), and would be better modeled as an attribute on Property (maybe just PhysicalProperty) so that there is a common access point to this very important qualifier.

I don not agree. Either meaning, scope or cardinality of flags are too much flexible to consider them has an attribute with a predetermined role. I really prefer the actual semanticless association,

The suggested changes: I'll post a sketch this proposal in a new issue.