ivoa-std / ADQL

Astronomical Data Query Language Standard
https://wiki.ivoa.net/twiki/bin/view/IVOA/ADQL
Creative Commons Attribution Share Alike 4.0 International
7 stars 7 forks source link

Clarify discussion of coordsys argument #58

Closed mbtaylor closed 3 years ago

mbtaylor commented 3 years ago

Section 4.2.4 "Coordsys" provides a useful historical review of the coordsys argument in geometry constructor functions:

For historical reasons, the geometry constructors (BOX, CIRCLE, POINT and POLYGON) all accept an optional string literal as the first argument. This was originally intended to carry information on a reference system or other coordinate system metadata. As of this version of the specification this parameter has been marked as deprecated.

But this is slightly misleading, since it sounds (to me) like the coordsys parameter has always been optional. I suggest rewording to make it clear that it has become optional only in this version, e.g. change the last sentence to:

As of this version of the specification this parameter has been made optional and marked as deprecated.

The following sentence says

Services are permitted to ignore this parameter and clients are advised to pass an empty string here.

but in all the examples later in the document (e.g. sec 4.2.9, 4.2.11), the first argument is omitted, not passed as an empty string. Should this advice be changed to "clients are advised to omit this argument"?

gmantele commented 3 years ago

Section 4.2.4 "Coordsys" provides a useful historical review of the coordsys argument in geometry constructor functions:

For historical reasons, the geometry constructors (BOX, CIRCLE, POINT and POLYGON) all accept an optional string literal as the first argument. This was originally intended to carry information on a reference system or other coordinate system metadata. As of this version of the specification this parameter has been marked as deprecated.

But this is slightly misleading, since it sounds (to me) like the coordsys parameter has always been optional. I suggest rewording to make it clear that it has become optional only in this version, e.g. change the last sentence to:

As of this version of the specification this parameter has been made optional and marked as deprecated.

Agreed. I would even go a bit further by removing the first "optional" term as its optional state comes with the deprecation. Keeping it here would mean that it has always been optional. Or just saying "all accept a now optional string literal" (but I don't know it is really a good english way to say it) or "may now all accept a string literal". But the simplest is probably to remove it.

So, combining your proposal and mine:

For historical reasons, the geometry constructors (BOX, CIRCLE, POINT and POLYGON) all accept a string literal as the first argument. [...] As of this version of the specification this parameter has been made optional and marked as deprecated.

gmantele commented 3 years ago

The following sentence says

Services are permitted to ignore this parameter and clients are advised to pass an empty string here.

but in all the examples later in the document (e.g. sec 4.2.9, 4.2.11), the first argument is omitted, not passed as an empty string. Should this advice be changed to "clients are advised to omit this argument"?

You're right, some precisions are probably needed here. What about the following?

Services are permitted to ignore this parameter and clients are advised, if possible, to omit this argument or otherwise to pass an empty string.

mbtaylor commented 3 years ago

That sounds fine to me.

Zarquan commented 3 years ago

Services are permitted to ignore this parameter and clients are advised, if possible, to omit this argument or otherwise to pass an empty string.

If a client sends this:

.... CIRCLE('GALACTIC', 30.0, 30.0, 0.1)

Are we happy for the server to just ignore the coordinate system specified in the query and return a response based on 'ICRS' - without warning the client that they are not getting what they asked for?

Perhaps it might be better for the server to have to read the parameter but allowed to reject anything it didn't understand. Fail early with an error message rather than fail silently and return the wrong numbers.

gmantele commented 3 years ago

You're absolutly right @Zarquan . But an error for a deprecated features that probably did not work in the past for a specific implementation sounds a bit too exteme for me. A warning seems more appropriate, though is not very often noticed by users (too impatient to get their result).

If I take VOLLT/TAPLib as an example of ADQL-2.0 implementation, depending on its configuration (which is written by the service publisher, not the library itself), the coord. sys. argument is either silently ignored or an error is thrown if not matching some predefined configured coord. sys. (either limited to the one really used in the database, or to multiple one in the case conversions are supported like in TAP-VizieR). There is currently no mechanism for warnings, but it could be done quite simply.

Anyway, I don't know if ADQL is the good place to define such warning/error. I may be wrong here, but it is something that is implementation dependent. So, maybe the right way to do it in ADQL is to eventually add a sentence like this:

In case a non empty coordinate system argument is provided but ignored by an implementation, this latter should report it as a warning and specifies which coordinate system finally applied. The mechanism to report this information is implementation dependent.

Does it sound better? (apart from the english that should probably be fixed) "should report" or "may report"? Warning or Error?

gmantele commented 3 years ago

@mbtaylor , @Zarquan, @msdemlei , @pdowler , an opinion on this final proposition?

mbtaylor commented 3 years ago

Tricky. In practice a warning may well not be seen by TAP users. In some cases an error (query failure) would be the most appropriate thing: if the service sees 'ICRS' and knows it's using galactic coordinates, it would be a good idea to reject the query. But requiring services to check this is asking too much, and requiring an error for non-empty values may break backward compatibility: various TAP/ADQL examples stick 'ICRS' in as the first argument of these functions, which may give the right result, but the implementation may not be able to determine that the data are in ICRS so would end up rejecting otherwise working queries. So I don't think we can have a MUST here, and I'd be inclined to leave the details up to the implementations.

So I'd favour something like:

If a non-empty coordinate system is provided but ignored, implementations MAY reject the query or convey a warning using some implementation-dependent mechanism.

this doesn't really say anything that's not obvious, but at least it shows we've thought about it.

I'm happy to defer to others with better ideas.

Since as far as I know this feature has never been in much/any use, I doubt that whatever we do here will have much impact in practice.

msdemlei commented 3 years ago

On Wed, Sep 08, 2021 at 07:48:02AM -0700, Mark Taylor wrote:

So I'd favour something like:

If a non-empty coordinate system is provided but ignored, implementations MAY reject the query or convey a warning using some implementation-dependent mechanism.

The problem with this is that even the "ignored" is hard to define here, because the semantics of that first argument have always been shoddily defined. Dave's example of

CIRCLE('GALACTIC', 30.0, 30.0, 0.1)

illustrates that quite nicely: As such, I can't possibly ignore "Galactic", as there is no behaviour assigned to it (well, except when I'm serialising it to STC-S, which is another thing we'd like to get rid of). Of course, further up, when it might say

1=CONTAINS(point('ICRS', 10, 20), CIRCLE('GALACTIC', 30.0, 30.0, 0.1))

I say I'm ignoring it when I don't (or do?) rotate one of the arguments. But it's not even clear if the original intention was that or something else. Let's face it: this entire thing is broken in 2.0, and that's why we're deprecating it.

For the record, DaCHS does the rotation for a few bespoke frames (try

select top 1 * from annisred.main where
0=CONTAINS(point('ICRS', 10, 20), CIRCLE('GALACTIC', 30.0, 30.0, 0.1))

on http://dc.g-vo.org/tap and inspect the sql_query INFO). But I can't say I confidently believe any other behaviour is less justified, and there are many variations when DaCHS doesn't bother.

I've also fiddled around with warning against things like

  POINT('GALACTIC', raj2000, dej2000)

where I know that the frame passed doesn't match the columns. But when trying to write code, I encountered a neverending stream of crazy corner cases and decided it's not worth it (partly also because don't have a bespoke INFO for "Warning" that would then show up as a little icon in TOPCAT, which I would like to have -- but mainly because it's a messy pain full of heuristics).

In all: I think we should just deprecate the coosys argument and not try to fix the deficiencies 2.0 has with defining what it should do. Which means I'd say:

This specification makes no guarantees for any semantics for the deprecated COOSYS argument. Queries against services implementing ADQL 2.1 should omit it. For interoperability, queries against 2.0 services should not pass arguments with differing COOSYS arguments to CONTAINS or INTERSECTS, as behaviour is undefined in that case.

That is, I think, the best we can do. We can't retroactively fix 2.0. And inventing new semantics for things we want to get rid of is a waste of time.

Zarquan commented 3 years ago

THIS

because don't have a bespoke INFO for "Warning" that would then show up as a little icon in TOPCAT, which I would like to have

We have limited resources, and rather than spending time trying to handle all the side effects of a deprecated param, we should be looking at developing things we clearly need, like a reliable path for WARNING messages to get from the server to the user.

Zarquan commented 3 years ago

I agree with @msdemlei.

This specification makes no guarantees for any semantics for the deprecated COOSYS argument.

Lets do this and see. If it breaks too many things, we solve those specific things in version 3.x.

gmantele commented 3 years ago

Considering the complexity of this problem, I am not especially against either solution. Looking at all this suggestions, it seems we all agree that it is really something depending from the implementation and there is not standardized solution that would solve the issue in all implementations. So the best we can do is indeed to make users and implementers aware about this issue and ask them to be careful when using this deprecated argument.

So, as you suggest @Zarquan , let's go for the solution of @msdemlei . I'll publish a PullRequest very soon with it.

Thank you to all of you for this discussion and all your ideas. It was really very helpful :)

gmantele commented 3 years ago

because don't have a bespoke INFO for "Warning" that would then show up as a little icon in TOPCAT, which I would like to have

We have limited resources, and rather than spending time trying to handle all the side effects of a deprecated param, we should be looking at developing things we clearly need, like a reliable path for WARNING messages to get from the server to the user.

I fear it is something we can not do much about ; the way to report warnings and errors really depends on the service/protocol using ADQL (e.g. TAP), and then to each of its implementations. Not speaking only about COOSYS, all we could do is just to say that a warning/error should/must/may somehow be reported in implementations. But I agree that having a way to report warnings (and why not advice for improving a query, though this might be really tricky to do) in protocols like TAP would be really nice.

About deprecated features like COOSYS, since warnings reported by TOPCAT are coming from my library, I guess I could update my library so that it can detect whether there are deprecated things in the input query and prepare warnings that TOPCAT could then somehow report. That may differ from what the actual TAP service would really do when the query will be run in there, but it would already be some kind of improvement. Perhaps an idea to explore some day....