ivoa-std / VOTable

VOTable Format Definition
4 stars 15 forks source link

Give position components roles in COOSYS #40

Closed msdemlei closed 11 months ago

msdemlei commented 1 year ago

This entails allowing FIELDref-s and PARAMref-s in COOSYS and defining role names for the various compontents.

In consequence, COOSYS now is no longer derived from string. Up to now, writing Some random text was legal. I am pretty sure that was not intended.

Also, adding regression tests for the xSYS elements.

If this is agreed upon, we should probably allow a votable:location on TIMESYS, too.

mbtaylor commented 1 year ago

This is a somewhat significant change to the meaning/usage of COOSYS so I suggest a bit more discussion of the change and its motivation should be included in or referenced (github issue? mailing list?) from this PR; it does more than address issue #16 that is referenced in one of the commit messages. I'm sure the intention is not to sneak it in under the radar, but process should be seen to be followed...

In terms of the form and content of the changes I think it's basically OK, though I have some quibbles with the ad-hoc utype names defined here:

msdemlei commented 1 year ago

On Wed, Jun 28, 2023 at 03:36:27AM -0700, Mark Taylor wrote:

This is a somewhat significant change to the meaning/usage of COOSYS so I suggest a bit more discussion of the change and its motivation should be included in or referenced (github issue?

I'd leave that decision to @tomdonaldson; but yes, this definitely needs some public consensus building.

In terms of the form and content of the changes I think it's basically OK, though I have some quibbles with the ad-hoc utype names defined here:

  • Allowing LonLatPoint-dist to refer to a parallax feels a bit like abuse of the quantity defined in Coords (though Coords doesn't explicitly say so, by my reading it's intended as a radial distance).

It would certainly be great if we didn't have to ad-lib this and there was a DM that has worked all of this out. Anyway, the way this is done here certainly is convenient with astropy (see their PR 14992). Having two different utypes for the concept of "distance" would complicate that implementation (you have to look for both and then make sure you somehow handle the situation when there's both of them). And all that becomes twice as worse when we add errors or each quantity, and eight times worse when we add correlations.

I'll also note that the "nature" of the distance axis is uniquely determined by whether the unit is an angle or a length, and consumers have to inspect the unit anyway. I hence think the proposed design will simplify basically all implementations.

But then I'm happy to consider alternative proposals.

  • ProperMotion-rv sounds like a term that might annoy astrometrists, since by my understanding radial velocity is not a proper motion (but disclaimer: I'm not an astrometrist, so if somebody smarter than me disagrees I will retract).

Let's wait for the few remaining astrometrists of the world to gather up and fetch their muck forks. If they do, we (or the Coords people) can still do whatever they ask us to do.

  • Allowing LonLatPoint-dist to refer to either a parallax or a radial distance, and ProperMotion-rv to refer to either a velocity or a redshift, based on the units of the column in question, seems like an unnecessary bit of economy in utype creation. I think that having separate terms for the different cases (e.g. "parallax", "LonLatPoint-dist", "redshift", "rv") would favour more robust processing.

Lumping redshift and rv together does sound like a bit of a stretch, but IMHO mainly because redshift is unitless. Giving that a well-defined meaning may promote oversights into claims of rather extreme physics. Unless someone wants to save the redshift part of the rv very much, I'd say we drop it; I don't think there is many resources that give both redshifts and tangential proper motions anyway.

For transparency: The term "radial proper motion" is actually used in ESA-SP-1200 (my bible for this kind of thing). That's basically is radial velocity expressed in angle/time (the time derivative of the parallax), and if we keep both parallax and distance in LonLatPoint-dist, I suspect we also should mention that possibility in ProperMotion-rv. I'm not aware of a catalogue giving this "radial proper motion" in the wild, though, so operationally nothing would be lost if didn't do that.

mbtaylor commented 1 year ago
  • Allowing LonLatPoint-dist to refer to a parallax feels a bit like abuse of the quantity defined in Coords (though Coords doesn't explicitly say so, by my reading it's intended as a radial distance).

It would certainly be great if we didn't have to ad-lib this and there was a DM that has worked all of this out. Anyway, the way this is done here certainly is convenient with astropy (see their PR 14992).

I haven't read that Astropy PR in sufficient detail to understand why that's the case, but this sounds like designing the data format to comply with the details of some particular language-specific library code, which I'd say is generally a bad idea. For instance there is not currently code in topcat for which organising the metadata in this way would simplify the implementation.

In any case an (observed) parallax is not a distance; and you might legitimately want to record both.

Let's wait for the few remaining astrometrists of the world to gather up and fetch their muck forks. If they do, we (or the Coords people) can still do whatever they ask us to do.

The intersection of the remaining astrometrists and people who read VOTable github issues is not large, so I feel justified in raising the point on their behalf here. But if ESA-SP-1200 is OK with radial proper motion (in a sense that corresponds with what's being proposed here) it's good enough for me.

msdemlei commented 1 year ago

On Tue, Jul 04, 2023 at 04:33:45AM -0700, Mark Taylor wrote:

I haven't read that Astropy PR in sufficient detail to understand why that's the case, but this sounds like designing the data format to comply with the details of some particular language-specific library code, which I'd say is generally a bad idea. For instance

That is a valid concern, but astropy's dealing with this happens to align with my experience with dealing with this kind of thing elsewhere: You have to dispatch on parallax vs. distance on something. Since you have to deal with the unit anyway (pc vs. kpc vs. metres in distance, arcsec vs. mas vs. deg in parallax), it is not entirely unreasonable to expect that an extra dispatch on a utype won't help implementation a lot.

But I am actually hoping for a second implementation in STIL once this has shaken out a bit, and I'll be happy to be taught otherwise.

In any case an (observed) parallax is not a distance; and you might legitimately want to record both.

That's another valid concern. In plain epoch propagation, this difference does not matter (much), though.

It is a major concern when doing errors, because symmetrical errors in parallax become asymmetrical in distance. Not having tried this, I can't confidently decide whether accounting for that becomes easier when you have two different utypes.

The intersection of the remaining astrometrists and people who read VOTable github issues is not large, so I feel justified in raising

Well, that's why we have proper RFCs and mailing lists. For the record, I have zero emotion on the utype strings. Whatever anyone else proposes and is willing to argue for I'll follow.

tomdonaldson commented 11 months ago

I do think this needs more discussion but am on the fence about whether to merge this to have the discussion on the (soon to be published) Working Draft or have it based on this branch. In either case I think we need to call for more readers/reviewers on the Apps list and possibly DM or Interop since this use case cuts across many interests. I'd welcome more opinions on whether to merge this before I create the WD.

At a slightly lower level, this does seem like a reasonable, perhaps interim, approach to assigning roles to the FIELD values for the purpose of creating SkyCoord instances or other position objects in other frameworks. The use case of applying it to Gaia positions and proper motions in the astropy PR (well worth reading) helps demonstrate its utility (at least for that one long-desired case). All that said, it does bring up the reasonable question of whether we can do something similar based on the recent VODML progress in a similar amount of time.

Since the VODML approach is not tied to VOTABLE version, it does not have the same constraints on our cadence of new versions, so documenting and implementing that feature with VODML could come shortly after a VOTable 1.5 release without needing to wait for 1.6. However fully supporting this use case will not be ready for quite a but longer, then it's probably worth the implementation effort to support this in 1.5.

At the detail level, as I understand it, the main value to this approach is that the FIELDs can be explicitly grouped to show which FIELDs should be used to create which position objects. In that case, should we be very explicit that only of each role can appear in a single COOSYS (and then enforce that in the parsers)?

msdemlei commented 11 months ago

On Fri, Sep 01, 2023 at 02:37:45PM -0700, Tom wrote:

case, should we be very explicit that only of each role can appear in a single COOSYS (and then enforce that in the parsers)?

That's a very good point. Yes, we ought to be clear on this. Indeed, our references would be ideal candidates for attributes, where uniqueness would be guaranteed by XML. But then we already have FIELDref and PARAMref, and it would be a shame not to use them for something they clearly were designed for.

mbtaylor commented 11 months ago

Coming back to this; I suspect we've had as much involvement as we're going to in this PullReq, so I think once any details raised here have been cleared up I would back merging it to master, to facilitate wider review as a published WD or PropRec. I do think it's a worthwhile addition, though it may get some pushback from VODML enthusiasts.

However I still have the misgivings I voiced earlier about semantic overloading of a couple of the utypes (LonLatPoint-dist and ProperMotion-rv). @msdemlei wrote:

That is a valid concern, but astropy's dealing with this happens to align with my experience with dealing with this kind of thing elsewhere: You have to dispatch on parallax vs. distance on something. Since you have to deal with the unit anyway (pc vs. kpc vs. metres in distance, arcsec vs. mas vs. deg in parallax), it is not entirely unreasonable to expect that an extra dispatch on a utype won't help implementation a lot.

It is true that under this scheme client implementations will require fewer lines of code. To put it another way the responsibility for ensuring data integrity is pushed away from client authors towards data providers. Under this arrangement an inadvertently missing unit attribute (e.g. a misspelt attribute name) on a quantity labelled "votable:ProperMotion-rv" means it must be interpreted as a redshift, which would be quite confusing to a non-expert user of the data. It can be hard or impossible for validators to pick up this kind of thing. Of course if all the data is perfectly marked up it's not an issue, but in the real world I still favour having utypes that say what they mean, with clients having to do a bit more work identifying and reporting on inconsistencies.

msdemlei commented 11 months ago

On Thu, Sep 07, 2023 at 01:23:18AM -0700, Mark Taylor wrote:

to a non-expert user of the data. It can be hard or impossible for validators to pick up this kind of thing. Of course if all the data is perfectly marked up it's not an issue, but in the real world I still favour having utypes that say what they mean, with clients having to do a bit more work identifying and reporting on inconsistencies.

I can't say I'm absolutely convinced that creating more ways to produce invalid annotations (like parallax pointing to a value in pc) will actually help data integrity, but then I'm inclined trust the judgement of taplint's author more than mine.

But of course creating these names halfway consistently immediately becomes a pain.

LonLatPoint-dist vs. parallax vs. redshift feels wrong; since they're essentially the same thing, I'd say they should look somewhat the same. Perhaps Radial-dist, Radial-parallax, and Radial-redshift? Of course, we're deviating more from the Coords DM there, which may give us trouble in review.

We then have the same problem on the derivatives. It probably would have been better if Coords had talked about derivatives from the start, but, ah well. Having ProperMotion-lon vs. rv again feels wrong to me. What if we said Derivative-lat, Derivative-lon, Derivative-dist, Derivative-parallax, and Derivative-redshift?

I can't say I like any of that much. So... can I ask you to...

(a) ..once more think hard whether relying on the units (people will have to look at them anyway, right?) isn't the right thing to do after all and

(b) ...if you still want multiple utypes on the radial axis, pick (or invent) some, tell me about them, and then stand by my side if people start arguing.

mbtaylor commented 11 months ago

CoordsDM-compatibility aside, I'd say something like lon, lat, parallax, pm_lon, pm_lat, radial_distance, radial_velocity, redshift. When somebody sees one of those, they know what it means. You could slap a "_error" on them, or pair them with some kind of punctuation to make correlations, if those requirements arise.

Derivative-lon sounds a bit less like a quantity with the cos(delta) correction applied, so I'd say it's more confusing tham pm_lon (as well as being a term that I've never heard used). Derivative-parallax and derivative-redshift ... what would those be for?

As you note, this makes no attempt to align with CoordsDM. My personal suggestion would be to put it out there and see if there are complaints (and if there are, I'd reiterate the points from this discussion).

But I'm not unmoveable. Happy to hear other opinions (@tomdonaldson ?).

tomdonaldson commented 11 months ago

I think you both have raised good points and concerns. The drawbacks either way seem similar in weight to me, so I'll be inclined to go with whatever consensus can be reached in a broader discussion.

From both a data provider and client maintainer perspective, my biggest concern with this is very similar to my concern with VODML-based approaches. That is, is it clear to me how to unambiguously and correctly annotate my data, and can I consume the data unambiguously without making subtle or hidden assumptions?

Since this approach may really just be a stopgap measure before the VODML pipeline matures a little more, I think it's very important to not try to do too much with it. I think this should address a very small number of use cases and we should be very explicit about what they are. Those use cases should not have many options at all, if any. With units in particular, we see missing or incorrect units fairly frequently on RAs and Decs so I've come to not trust them, but for a narrow use case like this, relying on them could be OK I guess.

To be slightly extreme, I would be satisfied if we show that there is exactly one way to annotate a Gaia source list as it is typically returned by ESA today. Seeing that level of communication and interoperability among Gaia providers and clients could actually be a big step forward and could lead to obvious discussions on extending this approach or matching then extending it with VODML.

To that end, I'm inclined to merge this as is, publish a WD, and provoke discussion there. I do think a follow-up PR to make very explicit the limited scope of this feature would make me more comfortable with it, but that can be part of the WD discussion. As people propose changes, we could nudge them over here for discussion in PRs where it helps thread the discussions.

@msdemlei FYI I have rebased and force pushed this branch to remove merge conflicts.

mbtaylor commented 11 months ago

@tomdonaldson that's fine by me.

tomdonaldson commented 11 months ago

I'll try to resolve the conflicts again and get this merged. Funny how all the recent work has been in the same section of the document.

msdemlei commented 11 months ago

On Thu, Sep 07, 2023 at 08:48:26AM -0700, Tom wrote:

to not try to do too much with it. I think this should address a very small number of use cases and we should be very explicit about what they are. Those use cases should not have many options

Exactly: epoch propagation and only epoch propagation.

at all, if any. With units in particular, we see missing or incorrect units fairly frequently on RAs and Decs so I've come to not trust them, but for a narrow use case like this, relying on them could be OK I guess.

We have no choice: If the units are bad, the whole plan breaks.

Actually, I'd claim the reason too many units are still bad or missing is that so far, almost nobody has done anything with them. Now that pyVO is moving towards returning QTables by default (IIRC) you'll see that unit annotation will improve quickly.

To be slightly extreme, I would be satisfied if we show that there is exactly one way to annotate a Gaia source list as it is typically returned by ESA today. Seeing that level of

I'm optimistic my proposal will pass that test. Let's ask them when we've merged, and I promise not to give them secret advice.

msdemlei commented 11 months ago

On Thu, Sep 07, 2023 at 06:42:46AM -0700, Mark Taylor wrote:

CoordsDM-compatibility aside, I'd say something like lon, lat, parallax, pm_lon, pm_lat, radial_distance, radial_velocity, redshift. When somebody sees one of those,

For the record, I'd very much like these (prefixed with votable:, perhaps). Oh, and on consideration, I'd be skipping redshift, where it wouldn't be clear if that's a distance or the derivative of a distance, and nobody wants to do epoch propagation on redshifts anyway, I hope. Which made me notice I'm talking about redshift in rv, which, consequently, doesn't make sense either. I've dropped that in commit 03750f6.

If the utype discussion gets ugly, let's propose those as "if there weren't the Coords DM, this is what we'd use".

Derivative-parallax and derivative-redshift ... what would those be for?

Well, the time derivatives of parallaxes (i.e., distance in angles) or redshifts (i.e., unitless distances). The former are called "radial proper motion" in ESA-SP 1200 (that's HIPPARCOS), but I'm not aware that anyone uses them outside of astrometric computations. If you look at the sixth element of what the array version of esdc_epoch_prop at the ESAC Gaia service returns, pm_rv [mas/yr]: that's it.

I can't point to someone giving derivatives of redshifts used as distance indicators. But that's probably mainly because I'm not watching cosmology services very closely. Then again, dimensionally it's an acceleration, and so we probably don't want to think about it.

Be that as it may: They're not important for what epoch propagation, so except for the entertainment value and possible implications for Coords3 let's forget about them. It's fine to simply do radial_velocity as the radial time derivative.

As you note, this makes no attempt to align with CoordsDM. My personal suggestion would be to put it out there and see if there are complaints (and if there are, I'd reiterate the points from this discussion).

Good plan.

Bonnarel commented 11 months ago

I think refering to Coords DM "roles" using FIELDref and PARAMref utypes make sense. But I wonder why we have to change the VOTable schema to do that. The PR merged here confuses the functionality of COOSYS and GROUP. What is the reason why a GROUP containing the proposed FIELDref and PARAMref and refering to a specific COOSYS should not work ? I don't understand the issue with this method I suggest.

More generally I think it's important to distinguish the functionality of defining a coordinate system (which may be useful fir different FIELDs, PARAMs, GROUPs) and the functionality of GROUPing elements for a given meaning : epoch propagation or whatever.

msdemlei commented 11 months ago

On Thu, Sep 14, 2023 at 02:16:12AM -0700, François Bonnarel wrote:

I think refering to Coords DM "roles" using FIELDref and PARAMref utypes make sense. But I wonder why we have to change the VOTable schema to do that. The PR merged here confuses the functionality of COOSYS and GROUP.

Well, arguably COOSYS should have been a GROUP from the start, so it's not overly surprising that making it work properly moves it towards a GROUP.

But I'm not married to this specific arrangement.

What is the reason why a GROUP containing the proposed FIELDref and PARAMref and refering to a specific COOSYS should not work ? I don't understand the issue with this method I suggest.

That probably would work well, too; you'd write something like

<COOSYS ID="foo" .../>

...

The downside is that you'll have another ID/ref pair to manage, and both servers and clients will have to do that every time they'll serialise and deserialise VOTables.

Compared to that, a very minor change to the VOTable schema seems preferable to me, but I won't quarrel a lot, in particular because there's a metric ton of referencing going on already, and that can't be helped realistically.

Suggestions: Have a look at https://github.com/astropy/astropy/pull/14992. Feel free to fork it and put in the extra indirection you suggest. If you then still prefer the extra GROUP, I'll do a PR against VOTable to revert the schema change.

More generally I think it's important to distinguish the functionality of defining a coordinate system (which may be useful fir different FIELDs, PARAMs, GROUPs) and the functionality of GROUPing elements for a given meaning : epoch propagation or whatever.

Why do you think that? What would be the advantage of avoiding two COOSYS-es of, say, ICRS BARYCENTER? (Other than saving a single element, that is, but in typical VOTables an element more or less is entirely irrelevant).

Bonnarel commented 11 months ago

Le 14/09/2023 à 13:17, msdemlei a écrit :

On Thu, Sep 14, 2023 at 02:16:12AM -0700, François Bonnarel wrote:

I think refering to Coords DM "roles" using FIELDref and PARAMref utypes make sense. But I wonder why we have to change the VOTable schema to do that. The PR merged here confuses the functionality of COOSYS and GROUP.

Well, arguably COOSYS should have been a GROUP from the start, so it's not overly surprising that making it work properly moves it towards a GROUP.

But I'm not married to this specific arrangement.

What is the reason why a GROUP containing the proposed FIELDref and PARAMref and refering to a specific COOSYS should not work ? I don't understand the issue with this method I suggest.

That probably would work well, too; you'd write something like

<COOSYS ID="foo" .../>

...

Yes exactly this !

The downside is that you'll have another ID/ref pair to manage, and both servers and clients will have to do that every time they'll serialise and deserialise VOTables.

Compared to that, a very minor change to the VOTable schema seems preferable to me, but I won't quarrel a lot, in particular because there's a metric ton of referencing going on already, and that can't be helped realistically.

Exactly ! Suggestions: Have a look at https://github.com/astropy/astropy/pull/14992. Feel free to fork it and put in the extra indirection you suggest. If you then still prefer the extra GROUP, I'll do a PR against VOTable to revert the schema change. OK. agreed.

More generally I think it's important to distinguish the functionality of defining a coordinate system (which may be useful fir different FIELDs, PARAMs, GROUPs) and the functionality of GROUPing elements for a given meaning : epoch propagation or whatever.

Why do you think that? What would be the advantage of avoiding two COOSYS-es of, say, ICRS BARYCENTER? (Other than saving a single element, that is, but in typical VOTables an element more or less is entirely irrelevant).

Well, coordinate systems are some well defined reference system of the space. As already said

the initial idea of COOSYS was to have one element gathering all the features of coordinate systems

This is not only used internally as in the epoch propagation use case but also for comparison across VOtables

conversions, etc...

So I think it's better to let them independant (but refereable) from operational features.

— Reply to this email directly, view it on GitHub https://github.com/ivoa-std/VOTable/pull/40#issuecomment-1719256399, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMP5LTFWS5FNKSB46EPTBJLX2LRTDANCNFSM6AAAAAAZVSZC3U. You are receiving this because you commented.Message ID: @.***>

msdemlei commented 11 months ago

On Fri, Sep 22, 2023 at 06:57:51AM -0700, François Bonnarel wrote:

This is not only used internally as in the epoch propagation use case but also for comparison across VOtables conversions, etc...

Hm -- if you do that, you'll have to compare the various aspects (frame, refposition, perhaps equinox, etc) of the coosys anyway. There simply is no way comparing element identity (of the COOSYS element) would ever be robust, so... I don't think that's a strong argument for complicating the use of COOSYS in astrometric operations.