ivoa / dm-usecases

The is repo gathers all the material to be used in the DM workshop 2020
The Unlicense
1 stars 3 forks source link

Improvements after Mark's first review #5

Closed msdemlei closed 3 years ago

msdemlei commented 3 years ago

This PR fixes a a bug and adds some extra annotation:

Also, DaCHS now chooses more human-friendly identifiers when it can, so this should be a bit less brain-straining to comprehend.

Apologies for pushing it in one commit, but I forgot to pull and commit intermediate versions while the software and data annotation evolved, and now I'm too lazy to fiddle the things back, since I doubt anyone would look at the individual diffs anyway.

mcdittmar commented 3 years ago

I hope we will be cautious about updating the base files for the use cases since implementations are already underway.

Markus.. I'm seeing

raw file

annotated file:

The first SphericalCoordinate

While the 2nd SphericalCoordinate

Two different Annotations/representations of the same type (SphericalCoordinate).

Mark

On Mon, Mar 1, 2021 at 9:01 AM Laurent MICHEL notifications@github.com wrote:

Merged #5 https://github.com/ivoa/dm-usecases/pull/5 into main.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ivoa/dm-usecases/pull/5#event-4389718687, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADMLJCRFIWI4HL5GLPMVXBLTBOM4PANCNFSM4YMQ5ZXA .

lmichel commented 3 years ago

I agree with MCD. Raw_data shouldn’t be changed except in case of serious inconsistances. It would be safer to add a new raw file instead of pushing modifications that could break some other contributions.

Laurent

On 1 Mar 2021, at 18:06, Mark Cresitello-Dittmar notifications@github.com wrote:

I hope we will be cautious about updating the base files for the use cases since implementations are already underway.

Markus.. I'm seeing

raw file

  • new COOSYS, identical to first, used in new ssa_location PARAM
  • new TIMESYS, identical to first, unused
  • new PARAMs in 2nd PhotCAL which make it identical to first other than the referenced column (which is not part of the PhotDM model)
  • changed IDs for 'title', 'ra', 'dec' PARAMs
  • added ssa_location PARAM, which contains ra, dec values plus reference to 2nd COOSYS this is a basically complete Coordinate instance in one PARAM ra and dec values duplicate the independent 'ra', and 'dec' PARAMs I don't see any problems here since the duplicated content could be different

annotated file:

  • generated IDs for annotation elements are changed
  • 'extra' Cube instance removed
  • added Target instance in Dataset
  • pointer to 'ra' PARAM
  • pointer to 'dec' PARAM
  • pointer to 'ssa_location' PARAM which has [ ra, dec, coosys_ref ] Is this intentional? Why not just a pointer to coosys to complete the Coord spec.?
  • added 2nd Coords instance
  • obs_time
  • ssa_location (see below) I'm not sure what the 'stc2:Coords' elements are for, so I guess this is OK. I'm just wondering what the intent is and why isn't ssa_location just added to the first Coord instance?

The first SphericalCoordinate

  • orientation = "ICRS". ---\ From
  • epoch = "J2015.5". ---/ COOSYS
  • longitude = pointer to 'ra' PARAM
  • latitude = pointer to 'dec' PARAM

While the 2nd SphericalCoordinate

  • orientation = "ICRS". ---\ From
  • epoch = "J2015.5". ---/ COOSYS
  • value = pointer to PARAM with [ ra, dec, coosys_ref ] all together

Two different Annotations/representations of the same type (SphericalCoordinate).

  • Is that also part of your Annotation syntax spec?
  • could the second one JUST be a pointer to ssa_location PARAM?

Mark

On Mon, Mar 1, 2021 at 9:01 AM Laurent MICHEL notifications@github.com wrote:

Merged #5 https://github.com/ivoa/dm-usecases/pull/5 into main.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ivoa/dm-usecases/pull/5#event-4389718687, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADMLJCRFIWI4HL5GLPMVXBLTBOM4PANCNFSM4YMQ5ZXA .

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/ivoa/dm-usecases/pull/5#issuecomment-788112756, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACXOP6EGENWWDTVQCBR3CGLTBPCSXANCNFSM4YMQ5ZXA.

msdemlei commented 3 years ago

On Mon, Mar 01, 2021 at 09:12:51AM -0800, Laurent MICHEL wrote:

I agree with MCD. Raw_data shouldn’t be changed except in case of serious inconsistances. It would be safer to add a new raw file instead of pushing modifications that could break some other contributions.

In principle, that's true, but on the other hand there's the problem of inflating the number of cases if we keep adding files even for rather moderate changes.

Here, since the "symbolic" ids really make playing with the stuff easier for humans and the second Ada-style time series group was actually missing, I'd suggest to be a bit lenient here.

Of course, people are totally free to ignore the SSA-style position I've also added -- that's only there to illustrate the point on why we shouldn't link non-STC models to specific representations: Even our own protocols don't agree on how to do it.

As long as people can annotate either version, I'd say things are fine.

lmichel commented 3 years ago

Markus,

Your are right. I would just remind people to be cautious when they update a raw_data files,

Laurent — Translate with https://www.deepl.com/translator -- jesuischarlie/Tunis/Paris/Bruxelles/Berlin

Laurent Michel SSC XMM-Newton Tél : +33 (0)3 68 85 24 37 Fax : +33 (0)3 )3 68 85 24 32 Université de Strasbourg http://www.unistra.fr Observatoire Astronomique 11 Rue de l'Université F - 67200 Strasbourg

On 2 Mar 2021, at 12:52, msdemlei notifications@github.com wrote:

On Mon, Mar 01, 2021 at 09:12:51AM -0800, Laurent MICHEL wrote:

I agree with MCD. Raw_data shouldn’t be changed except in case of serious inconsistances. It would be safer to add a new raw file instead of pushing modifications that could break some other contributions.

In principle, that's true, but on the other hand there's the problem of inflating the number of cases if we keep adding files even for rather moderate changes.

Here, since the "symbolic" ids really make playing with the stuff easier for humans and the second Ada-style time series group was actually missing, I'd suggest to be a bit lenient here.

Of course, people are totally free to ignore the SSA-style position I've also added -- that's only there to illustrate the point on why we shouldn't link non-STC models to specific representations: Even our own protocols don't agree on how to do it.

As long as people can annotate either version, I'd say things are fine. — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/ivoa/dm-usecases/pull/5#issuecomment-788853664, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACXOP6BSAQ373APRB7OEA4DTBTGPJANCNFSM4YMQ5ZXA.

msdemlei commented 3 years ago

On Mon, Mar 01, 2021 at 09:06:51AM -0800, Mark Cresitello-Dittmar wrote:

raw file

  • new COOSYS, identical to first, used in new ssa_location PARAM
  • new TIMESYS, identical to first, unused

Yeah, that's an artefact of the misdesign of using ref on FIELD for TIMESYS (in parallel to COOSYS). The logic is that obs_time is part of two STC instances (the one with ra and dec) and the one with ssa_location). Hence, it would need to reference two STC metadata sets, but because @ref can only have one value, it can't, and hence the second TIMESYS is unreferenced.

Ugly (and I had argued stronly against keeping this ref mechanism when we designed TIMESYS), but we're here because we finally want to fix this mess.

  • new PARAMs in 2nd PhotCAL which make it identical to first other than the referenced column (which is not part of the PhotDM model)

Right -- the photometric system of flux and phot is identical in this case. It was an oversight that in the original upload flux was unannotated, so this fixes that. But again, that's legacy annotation irrelevant for what we're doing here.

  • changed IDs for 'title', 'ra', 'dec' PARAMs

That's the main selling point of changing raw_data.

  • added ssa_location PARAM, which contains ra, dec values plus reference to 2nd COOSYS this is a basically complete Coordinate instance in one PARAM ra and dec values duplicate the independent 'ra', and 'dec' PARAMs I don't see any problems here since the duplicated content could be different

Perhaps, but that's not the point here; the point is that as we evolve the VO, this kind of thing, having parallel representations of some information in "older" and "newer" annotation will be necessary to avoid flag days, which we simply can't have in the VO.

annotated file:

  • generated IDs for annotation elements are changed
  • 'extra' Cube instance removed
  • added Target instance in Dataset
    • pointer to 'ra' PARAM
    • pointer to 'dec' PARAM
    • pointer to 'ssa_location' PARAM which has [ ra, dec, coosys_ref ] Is this intentional? Why not just a pointer to coosys to complete the Coord spec.?

I'm afraid I don't understand what you are asking here -- which annotation are you referring to here? If you're wondering about the ref attribute the PARAM: that's legacy VOTable, not DM; as I said, that has turned out to be a bad idea a long time ago and was a main motiviation for proposing the STC-in-VOTable note.

  • added 2nd Coords instance
    • obs_time
    • ssa_location (see below) I'm not sure what the 'stc2:Coords' elements are for, so I guess this is OK. I'm just wondering what the intent is and why isn't ssa_location just added to the first Coord instance?

"stc2:Coords" simply links space and time and assigns the relevant frame metadata -- much like in STC1.

As to why there are two coords instances: well, we might think about allowing, in my notation, latitude, longitude, and value in one instance (as in: split values and a point), but I don't think "different serialisations" is a terribly useful use case.

No, this is just intended as a demonstration that you can have multiple different "representations" (i.e., complete sets of values and annotations) and use all of them in Dataset.target; again, this is the "avoiding flag days" thing.

The first SphericalCoordinate

  • orientation = "ICRS". ---\ From
  • epoch = "J2015.5". ---/ COOSYS
  • longitude = pointer to 'ra' PARAM
  • latitude = pointer to 'dec' PARAM

While the 2nd SphericalCoordinate

  • orientation = "ICRS". ---\ From
  • epoch = "J2015.5". ---/ COOSYS
  • value = pointer to PARAM with [ ra, dec, coosys_ref ] all together

Two different Annotations/representations of the same type (SphericalCoordinate).

  • Is that also part of your Annotation syntax spec?
  • could the second one JUST be a pointer to ssa_location PARAM?

The two annotations of the same type will probably be the exception rather than the norm. The more typical case would be having both an stc2 and an stc3 annotation on ra/dec.

But with the algorithm "iterate over the annotations and choose one that fits your purpose" this particular case can be covered, too.

mcdittmar commented 3 years ago

On Wed, Mar 3, 2021 at 4:12 AM msdemlei notifications@github.com wrote:

On Mon, Mar 01, 2021 at 09:06:51AM -0800, Mark Cresitello-Dittmar wrote:

raw file

  • new COOSYS, identical to first, used in new ssa_location PARAM
  • new TIMESYS, identical to first, unused

Yeah, that's an artefact of the misdesign of using ref on FIELD for TIMESYS (in parallel to COOSYS). The logic is that obs_time is part of two STC instances (the one with ra and dec) and the one with ssa_location). Hence, it would need to reference two STC metadata sets, but because @ref can only have one value, it can't, and hence the second TIMESYS is unreferenced.

Since they are identical, you only need the 1 instance, and all elements which are in that TIMESYS can reference the same instance.

  • added ssa_location PARAM, which contains ra, dec values plus reference to 2nd COOSYS this is a basically complete Coordinate instance in one PARAM ra and dec values duplicate the independent 'ra', and 'dec' PARAMs I don't see any problems here since the duplicated content could be different

Perhaps, but that's not the point here; the point is that as we evolve the VO, this kind of thing, having parallel representations of some information in "older" and "newer" annotation will be necessary to avoid flag days, which we simply can't have in the VO.

There is a point here w.r.t. the annotation requirements though. Your annotation for Target.position

One option would be to allow annotation of the complex object

There is a limitation in the Mapping syntax in that it cannot annotate individual array elements..like the ra dec in a <PARAM name="position" type=double arraysize=2 value="123.222 -12.222" /> So one is forced to use LITERALs and duplicate the values

is the "avoiding flag days" thing.

What does "avoiding flag days" mean?

The first SphericalCoordinate

  • orientation = "ICRS". ---\ From
  • epoch = "J2015.5". ---/ COOSYS
  • longitude = pointer to 'ra' PARAM
  • latitude = pointer to 'dec' PARAM

While the 2nd SphericalCoordinate

  • orientation = "ICRS". ---\ From
  • epoch = "J2015.5". ---/ COOSYS
  • value = pointer to PARAM with [ ra, dec, coosys_ref ] all together

The two annotations of the same type will probably be the exception rather than the norm. The more typical case would be having both an stc2 and an stc3 annotation on ra/dec.

But with the algorithm "iterate over the annotations and choose one that fits your purpose" this particular case can be covered, too.

OK.. I can see that being useful to include, it's an important feature of your annotation scheme. There is still the problem that 'SphericalCoordinate.value' is pointing to a complex PARAM, which goes back to the earlier point.

Mark

msdemlei commented 3 years ago

On Thu, Mar 04, 2021 at 08:59:56AM -0800, Mark Cresitello-Dittmar wrote:

On Wed, Mar 3, 2021 at 4:12 AM msdemlei notifications@github.com wrote:

Yeah, that's an artefact of the misdesign of using ref on FIELD for TIMESYS (in parallel to COOSYS). The logic is that obs_time is part of two STC instances (the one with ra and dec) and the one with ssa_location). Hence, it would need to reference two STC metadata sets, but because @ref can only have one value, it can't, and hence the second TIMESYS is unreferenced.

Since they are identical, you only need the 1 instance, and all elements which are in that TIMESYS can reference the same instance.

Sure, but as I said I maintain that is a consequence of the mis-design of COOSYS and TIMESYS, and as I hope we'll either get rid of them or fix them, I'll not introduce hacks into my code to catch this situation.

Perhaps, but that's not the point here; the point is that as we evolve the VO, this kind of thing, having parallel representations of some information in "older" and "newer" annotation will be necessary to avoid flag days, which we simply can't have in the VO.

There is a point here w.r.t. the annotation requirements though. Your annotation for Target.position

  • points individually to the ra param == RA (longitude)
  • points individually to the dec param == DEC (latitude)
  • points to the ssa_location param << contains (ra, dec, frameref ), so what is the user expected to pull from this? what does it represent?

In the use case "find the target location", the client will cycle through the columns referenced from data set until it finds an STC annotation it understands (or looks at all or them and chooses its preferred one). The others it simply ignores.

One option would be to allow annotation of the complex object

  • annotate SphericalCoordinate to ssa_location PARAM directly
  • which puts the burden on the client to properly parse the PARAM content into longitude, latitude, frame

Our annotation scheme should certainly work for the DALI types (point, circle, polygon, moc, interval...); me, I'd say a spherical point should have the model

  frame:
  long:
  lat:
  value:

(not sure if ra/dec and value should be mutually exclusive; probably not), where value just references a DALI point, and similarly a circle (if we still want it as such; I believe given what's happened in DALI we just need a geometry type linking anything with frame metadata):

  center_long:
  center_lat:
  radius:
  value:

There is a limitation in the Mapping syntax in that it cannot annotate individual array elements..like the ra dec in a <PARAM name="position" type=double arraysize=2 value="123.222 -12.222" /> So one is forced to use LITERALs and duplicate the values

Referencing array elements is a possibility in the case of points, but for our MOCs, say, that won't work. Which is why I'd much prefer if we'd enable referencing complete DALI values.

is the "avoiding flag days" thing.

What does "avoiding flag days" mean?

See https://en.wikipedia.org/wiki/Flag_day_(computing)

-- since in the VO, we simply cannot move all services and clients to a new version of anything in one go, we'll have to be very careful to design our standards such that we can do without them.

mcdittmar commented 3 years ago

Markus,

I'm not sure you're following my point..

I am confused by your annotation segment

where

<PARAM ID="ssa_location" arraysize="2" datatype="double" name="ssa_location" ref="system0" ucd="pos.eq" value="315.018457397759 35.3014389949649" xtype="point">

Target position as per SSA (a DALI point)

Because the 3rd CONSTANT (ssa_location) has a lot of information in it, and I'm not sure what the client is expected to pull out. Presumably the 'ref' to the coordinate system, but there is no reason the client would know that is what to grab, the 'value' certainly isn't right since that is another (ra,dec) pair.

So, I was wondering if the annotation scheme we adopt should allow/accommodate annotating a complex object with 1 reference.

The ssa_location PARAM has the (ra,dec) pair at 'value', and the coordsys reference, all together, which fills a SphericalCoordinate in 1 line. So one could, perhaps allow:

I'm not advocating one way or the other, but positing the question.

Mark

On Fri, Mar 5, 2021 at 5:21 AM msdemlei notifications@github.com wrote:

On Thu, Mar 04, 2021 at 08:59:56AM -0800, Mark Cresitello-Dittmar wrote:

On Wed, Mar 3, 2021 at 4:12 AM msdemlei notifications@github.com wrote:

Yeah, that's an artefact of the misdesign of using ref on FIELD for TIMESYS (in parallel to COOSYS). The logic is that obs_time is part of two STC instances (the one with ra and dec) and the one with ssa_location). Hence, it would need to reference two STC metadata sets, but because @ref can only have one value, it can't, and hence the second TIMESYS is unreferenced.

Since they are identical, you only need the 1 instance, and all elements which are in that TIMESYS can reference the same instance.

Sure, but as I said I maintain that is a consequence of the mis-design of COOSYS and TIMESYS, and as I hope we'll either get rid of them or fix them, I'll not introduce hacks into my code to catch this situation.

Perhaps, but that's not the point here; the point is that as we evolve the VO, this kind of thing, having parallel representations of some information in "older" and "newer" annotation will be necessary to avoid flag days, which we simply can't have in the VO.

There is a point here w.r.t. the annotation requirements though. Your annotation for Target.position

  • points individually to the ra param == RA (longitude)
  • points individually to the dec param == DEC (latitude)
  • points to the ssa_location param << contains (ra, dec, frameref ), so what is the user expected to pull from this? what does it represent?

In the use case "find the target location", the client will cycle through the columns referenced from data set until it finds an STC annotation it understands (or looks at all or them and chooses its preferred one). The others it simply ignores.

One option would be to allow annotation of the complex object

  • annotate SphericalCoordinate to ssa_location PARAM directly
  • which puts the burden on the client to properly parse the PARAM content into longitude, latitude, frame

Our annotation scheme should certainly work for the DALI types (point, circle, polygon, moc, interval...); me, I'd say a spherical point should have the model

frame:
long:
lat:
value:

(not sure if ra/dec and value should be mutually exclusive; probably not), where value just references a DALI point, and similarly a circle (if we still want it as such; I believe given what's happened in DALI we just need a geometry type linking anything with frame metadata):

center_long:
center_lat:
radius:
value:

There is a limitation in the Mapping syntax in that it cannot annotate individual array elements..like the ra dec in a <PARAM name="position" type=double arraysize=2 value="123.222 -12.222" /> So one is forced to use LITERALs and duplicate the values

Referencing array elements is a possibility in the case of points, but for our MOCs, say, that won't work. Which is why I'd much prefer if we'd enable referencing complete DALI values.

is the "avoiding flag days" thing.

What does "avoiding flag days" mean?

See https://en.wikipedia.org/wiki/Flag_day_(computing)

-- since in the VO, we simply cannot move all services and clients to a new version of anything in one go, we'll have to be very careful to design our standards such that we can do without them.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ivoa/dm-usecases/pull/5#issuecomment-791324931, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADMLJCRSXILBEXE42ZDQSXLTCCWDHANCNFSM4YMQ5ZXA .

msdemlei commented 3 years ago

On Sat, Mar 06, 2021 at 02:19:10PM -0800, Mark Cresitello-Dittmar wrote:

I'm not sure you're following my point..

I am confused by your annotation segment

where

<PARAM ID="ssa_location" arraysize="2" datatype="double" name="ssa_location" ref="system0" ucd="pos.eq" value="315.018457397759 35.3014389949649" xtype="point">

Target position as per SSA (a DALI point)

Because the 3rd CONSTANT (ssa_location) has a lot of information in it, and I'm not sure what the client is expected to pull out. Presumably the 'ref' to the coordinate system, but there is no reason the client would know that is what to grab, the 'value' certainly isn't right since that is another (ra,dec) pair.

A VOTable parser would know that this is just a point as per DALI (the xtype="point"). Note that DALI points come without any metadata by design (we had the previous STCS strings that contained them, and it was just pain). Also note that a data model-aware VOTable parser would ignore the ref to COOSYS -- the main reason I'm here is that COOSYS doesn't work well and needs to be replaced.

So, all the "target" annotation says is: "If you're looking for the target of the observation, check if you understand the annotation of any of ra, dec, or ssa_location."

Assuming the client would start its inspection with ssa_location, it would see that there is an stc2:Coords annotation for it (starting in line 171 in commit d6a2b5d8f07645d6cb929078e9960a35bce1b531). If it understands that as a position, it will stop right there, having a fully annotated target position. The value of the point is parsed using DALI rules.

So, I was wondering if the annotation scheme we adopt should allow/accommodate annotating a complex object with 1 reference.

The ssa_location PARAM has the (ra,dec) pair at 'value', and the coordsys reference, all together, which fills a SphericalCoordinate in 1 line. So one could, perhaps allow:

While I'm sure we need to be able to annotate the DALI types, we have to keep in mind that they are metadata-less. Hence, they still need to be annotated with, in this case, the frame, and they'll ususally have to be packed up with a time coordinate.

Hence, you cannot leave out the SpaceFrame annotation, and hence you can't directly use ssa_location as an instance.

You could argue that ra/dec should have another grouping level over value (compare the dmrole="space" annotations in the two stc2:Coords cases), but frankly, my take would be that that's just a bit of additional tag soup without helping anyone a lot.

mcdittmar commented 3 years ago

On Mon, Mar 8, 2021 at 2:58 AM msdemlei notifications@github.com wrote:

On Sat, Mar 06, 2021 at 02:19:10PM -0800, Mark Cresitello-Dittmar wrote:

A VOTable parser would know that this is just a point as per DALI (the xtype="point"). Note that DALI points come without any metadata by design (we had the previous STCS strings that contained them, and it was just pain). Also note that a data model-aware VOTable parser would ignore the ref to COOSYS -- the main reason I'm here is that COOSYS doesn't work well and needs to be replaced.

So, all the "target" annotation says is: "If you're looking for the target of the observation, check if you understand the annotation of any of ra, dec, or ssa_location."

Assuming the client would start its inspection with ssa_location, it would see that there is an stc2:Coords annotation for it (starting in line 171 in commit d6a2b5d8f07645d6cb929078e9960a35bce1b531). If it understands that as a position, it will stop right there, having a fully annotated target position. The value of the point is parsed using DALI rules.

Ah.. so, it looks like I was also not getting your point! I'm still looking at these as if they represent an underlying model, and trying to reconcile what that model is.. So, with your syntax, the objects aren't even expected to follow a model spec?

In your syntax, when does one create a different instance of the object vs adding more representations into the same instance? It looks like maybe:

To see if I've got it right.. is the following OK? ds:Target also has a 'name', so can I add this annotation within the same AstroTarget instance as in the example (with the ra,dec,ssa_location).?

And clients can choose the simple string or the vocabulary. NOTE: it doesn't really matter in this syntax whether or not the model holding the namespace "ds" actually expresses Target.name as a mango:VocabularyTerm or (ra,dec) as a 'meas:Position' vs 'dali:Point' NOTE: I assume that your syntax has a container for elements with multiplicity >1, so one can distinguish between new instances in the list and different representations of the same element.

Mark

msdemlei commented 3 years ago

On Mon, Mar 08, 2021 at 07:37:29AM -0800, Mark Cresitello-Dittmar wrote:

On Mon, Mar 8, 2021 at 2:58 AM msdemlei notifications@github.com wrote:

On Sat, Mar 06, 2021 at 02:19:10PM -0800, Mark Cresitello-Dittmar wrote: So, all the "target" annotation says is: "If you're looking for the target of the observation, check if you understand the annotation of any of ra, dec, or ssa_location."

Assuming the client would start its inspection with ssa_location, it would see that there is an stc2:Coords annotation for it (starting in line 171 in commit d6a2b5d8f07645d6cb929078e9960a35bce1b531). If it understands that as a position, it will stop right there, having a fully annotated target position. The value of the point is parsed using DALI rules.

Ah.. so, it looks like I was also not getting your point! I'm still looking at these as if they represent an underlying model, and trying to reconcile what that model is.. So, with your syntax, the objects aren't even expected to follow a model spec?

Oh, each individual object of course is -- the assumption is that there is VO-DML that says that an stc2:Coords has time and space attributes, that space can be a SphericalCoordinate (and possibly something else, too), and that that in turn has latitude and longitude (or just a value) and references a frame.

It's references across models that I consider so pernicious that I'm suggesting a mitigation, in particular through independent annotations of the same columns.

  • Target model presumably says it has a 'position' or "ra","dec" anyway.

No, target would say "this is where you can find target annotation". This is intentionally vague -- think of gravitational waves. Their primary "target" attribute would probably be one or more MOCs with the odd shapes they have for their likelihood regions.

However, it would be totally ok if they additionally pointed to a point or a ra, dec pair for simple clients that don't understand the MOC. To make that happen, it would say:

  <ATTRIBUTE dmrole="target">
    <INSTANCE dmtype="ds:AstroTarget">
      <ATTRIBUTE dmrole="position">
        <CONSTANT ref="location_for_legacy_clients"/>
        <CONSTANT ref="likelhood_moc"/>
      </ATTRIBUTE>
    </INSTANCE>
  </ATTRIBUTE>

or, considering what's below, perhaps rather:

  <ATTRIBUTE dmrole="target">
    <INSTANCE dmtype="ds:AstroTarget">
      <ATTRIBUTE dmrole="position">
        <CONSTANT ref="location_for_legacy_clients"/>
      </ATTRIBUTE>
      <ATTRIBUTE dmrole="position">
        <CONSTANT ref="likelhood_moc"/>
      </ATTRIBUTE>
    </INSTANCE>
  </ATTRIBUTE>

-- and GW-aware clients would just look at all params mentioned in target and unconditionally prefer MOC-like annotation.

  • Annotation of Target contains 1 or more representations of content which can be interpreted as a position: (RA, DEC) or (ssa_location)

...or, really, as anything else as the things evolve -- we'll never have to change ds:Dataset just because people need different representations for their targets.

  • Choose the one that you like

or "can deal with", more likely.

In your syntax, when does one create a different instance of the object vs adding more representations into the same instance? It looks like maybe:

  • create a new instance if the content has changed (ie: is a different version of the object)
  • append to the same instance if it is part of that same content.

In principle, this is essentially as in normal standards development: As long as you can compatibly change things, you add and modify. Once you have to make a breaking change, you add a new annotation (but can retain the old one for as long as necessary so you don't break legacy clients).

The reality may be more subtle as in the time series example, where there simply are two different serialisations for the same information, and there may be good reasons (as in: there are two clients relevant for the community, and one only knows representation a, and the other only knows representation b) to keep both. In that case, as, here, you could have multiple annotations from the same model (though I kind of regret having introduced this example, as it's probably a fringe case compared to the hopefully much more common case of having multiple annotations for the same things).

To see if I've got it right.. is the following OK? ds:Target also has a 'name', so can I add this annotation within the same AstroTarget instance as in the example (with the ra,dec,ssa_location).?

And clients can choose the simple string or the vocabulary.

In principle, it might work this way, yes (where I'd much rather keep the actual values in proper PARAMs, though).

NOTE: it doesn't really matter in this syntax whether or not the model holding the namespace "ds" actually expresses Target.name as a mango:VocabularyTerm or (ra,dec) as a 'meas:Position' vs 'dali:Point'

No... you can't just invent attribute names -- they have to be given by the model, VO-DML as we know and perhaps love it.

And that model could of course prescribe that name just is a string (in this case I'd quite certainly do that, because it pretty certainly helps the clients a lot, but then this is a toy example, so let's pretend otherwise).

It's just when referencing things outside of the model we should in general rely on referencing columns rather than external instances.

In that sense, a better example would be if there was an Identifier DM (prefix id), and then later a model id2 that replaces the human-oriented from_cat with a machine-oriented uriprefix attribute. Then you'd have something like this:

<VODML>
<INSTANCE dmtype="ds:AstroTarget">
  <ATTRIBUTE dmrole="name" ref="target_name">
  ...
</INSTANCE>

<INSTANCE dmtype="id:Identifier">
  <ATTRIBUTE name="from_cat"><LITERAL>LEDA</LITERAL></ATTRIBUTE>
  <ATTRIBUTE name="value"><CONSTANT ref="target_name"/></ATTRIBUTE>
</INSTANCE>

<INSTANCE dmtype="id2:Identifier">
  <ATTRIBUTE name="uriprefix"><LITERAL
    >http://vizcat.cds.fr/cats/LEDA#</LITERAL></ATTRIBUTE>
  <ATTRIBUTE name="value"><CONSTANT ref="target_name"/></ATTRIBUTE>
</INSTANCE>
</VODML>

<PARAM ID="target_name" ... value="889060"/>

-- the difference being that name is still unique (which is in general desirable) but different annotations for it can be chosen as convenient for a given client.

NOTE: I assume that your syntax has a container for elements with multiplicity >1, so one can distinguish between new instances in the list and different representations of the same element.

I'd say experience has to show if that distinction actually needs to be made in the serialisation. But true, we should think about the use case "sequence of externally annotated things". My current annotation experiment doesn't have that. I suppose we could say "repeated attributes are alternatives", but I've not really thought that through.