ivoa-std / ADQL

Astronomical Data Query Language Standard
https://wiki.ivoa.net/twiki/bin/view/IVOA/ADQL
Creative Commons Attribution Share Alike 4.0 International
7 stars 7 forks source link

ADQL geometry language feature identifier version inconsistency #71

Closed mbtaylor closed 1 year ago

mbtaylor commented 1 year ago

I raised this on the DAL list in 2019 but didn't get a response, so I'll try it here instead...

In the current ADQL 2.1 draft, geometrical functions are labelled using the following Language Feature identifier:

ivo://ivoa.net/std/TAPRegExt#features-adql-geo

However in ADQL 2.0, the corresponding functionality was marked by the feature (defined in TAPRegExt 1.0 Sec 2.3):

ivo://ivoa.net/std/TAPRegExt#features-adqlgeo

Note the lack of a second hyphen.

This looks like a typo, so I suggest changing adql-geo to adqlgeo in the current text. If not, there is an inconsistency between ADQL 2.0 and 2.1. I can prepare a PR on request.

This change could be problematic if there are already services out there using the adql-geo form as written in the current draft. However, I suspect that there are few or none.

RegTAP does not record language features, but if I look for all the services claiming to support ADQL 2.1:

select ivoid, detail_xpath, detail_value, access_url
from rr.res_detail
natural join rr.capability
natural join rr.interface
where detail_xpath = '/capability/language/version/@ivo-id'
  and detail_value = 'ivo://ivoa.net/std/ADQL#v2.1'
  and standard_id = 'ivo://ivoa.net/std/tap'
  and intf_type='vs:paramhttp'

there are 24. The couple I tried (both DaCHS) use adqlgeo not adql-geo already (i.e. in accordance with my suggested change, and in violation of the draft as currently written). I suspect the others are mostly DaCHS instances too, so have the same behaviour.

Since I spotted this in 2019 (STILTS v3.2 and later) taplint has issued an Error if the adql-geo form is used, like this:

E-CAP-KEYX-1 Unknown standard feature key "ivo://ivoa.net/std/TAPRegExt#features-adql-geo" for language ADQL-2.1

i.e. taplint currently agrees with my suggested change rather than what's written in the current ADQL 2.1 draft. So my guess is that this change is not going to cause problems.

This issue has come up now because ESDC are using the adql-geo form in their ADQL 2.1 services, as in the current draft, and have hit that taplint error.

gmantele commented 1 year ago

Sorry if your email of 2019 (that's indeed a long time ago) did not get any answer. And thank you to revive this in here (though I would have answered also my email, I think).

This skipped hyphen is indeed a problem for existing services and validators which use and declare such language feature.

It is a shame that ivo://ivoa.net/std/TAPRegExt#features-adqlgeo passed in ADQL-2.0 without this hyphen. All other identifiers of language feature have this hyphen ; all are prefixed by ivo://ivoa.net/std/TAPRegExt#features-adql-. I don't know if the missing hyphen in ADQL-2.0 was a typo or was intentional. Either way, now services adopted this missing hyphen it will hard to add it again and have consistent language features identifiers in ADQL-2.1.

I see three solutions (in my personal order of preference):

  1. Have all identifiers consistent. That's to say, keep the hyphen in ADQL-2.1 (so, a change since 2.0). And pray that services will update easily and quickly (which in practice may not be the case for all of them). Maybe an ADQL-2.0 Erratum would be appropriate too.
  2. Have all identifiers with the hyphen, except for geometrical features whose identifier will be ivo://ivoa.net/std/TAPRegExt#features-adqlgeo. Ugly, but no harm for anyone. In such case, I would add a comment in the ADQL-2.1 document to emphasize the missing hyphen is not a typo and should stay as such for historical reasons.
  3. Have all identifiers without this hyphen. So the prefix would be: ivo://ivoa.net/std/TAPRegExt#features-adql. Still, no harm for anyone but all identifiers will be a bit less readable.

Considering these identifiers are for machines only and so, are hidden from the users eyese all the time, I guess it is better to go for 2 or 3. None would break anything on the contrary to 1.

Mark, anyone else, what do you think?

mbtaylor commented 1 year ago

If my chronology is right, I don't think that either TAPRegExt (2012) or ADQL 2.0 (2008) can be blamed for this, since at the time of their publication there were no other identifiers of the form ivo://ivoa.net/std/TAPRegExt#features-adql*

History aside, I think that option 1 would be too disruptive, since there are many ADQL-2.0 services out there already using adqlgeo.

Option 2 is as you say ugly and annoying but not really harmful; I agree that a comment in ADQL-2.1 noting the discrepancy for historical reasons is a very good idea.

Option 3 may be OK (ADQL 2.1 is after all still not a REC) depending on how many ADQL-2.1 services and clients there are out there using these identifiers, and also how amenable to updating they are. Does anybody have an opinion or information on that? Maybe @msdemlei demlei does since as I noted above many/most look like DaCHS.

But on the whole I'd say my preference is option 2.

msdemlei commented 1 year ago

On Fri, Jan 27, 2023 at 10:22:06AM -0800, Grégory Mantelet wrote:

Mark, anyone else, what do you think?

Well, there is a fourth option. You see, I originally only stuck the identifiers into TAPRegExt because back then we said we shouldn't create identifiers in ADQL from TAPRegExt. In retrospect, I think none of that was a wise decision (that's particularly easy for me to say in this case because I said that back then already). We could rectify that to some extent now by saying:

(a) The legacy identifiers (features-udf, features-adqlgeo) remain in the TAPRegExt record.

(b) All new feature ids are now in the ADQL record. That would let us drop the adql fragment in the identifiers (which isn't too pretty anyway), and the form of the others would be

ivo://ivoa.net/std/adql#features-string

or so.

I think most clients would either discard the non-fragment part (which, for comparison purposes, would actually be smart, as the ivoid itself needs to be compared case-insensitively by ivoid rules -- yikes!) or compare the whole strings (which, given the case crazyness, actually is less right), so that probably wouldn't make implementations a lot uglier.

[now that I mention it: I think ADQL should say that while the feature identifiers contain an ivoid and that is case-insensitive, servers MUST use all-lowercase and clients MAY compare without case normalisation. Should I raise an issue of that? Or write a PR immediately?]

So... unless we're worried about ADQL 2.1 implementations already deployed, that would actually be my favourite outcome: There's a bit of easily-identified legacy, and there's new-style, consistent practice.

Now that I think of it, I could even see deprecating the TAPRegExt#adqlgeo identifier in 2.1 and creating a synonymous adql#features-(adql?)geo identifier, requiring clients to accept both. But I think the features-udf key would need to remain in TAPRegExt (I'd argue it's not really specific to ADQL), so perhaps that's something we can do in 2.2 just as well if we find that might help.

gmantele commented 1 year ago

Thank you to both of you for your answers.

Although ADQL-2.1 is not yet a REC, there are already services implementing and declaring ADQL-2.1. That's a shame...but that's completely fair as ADQL-2.1 is taking a very long time to reach REC, and some use-cases sometimes require some of the ADQL-2.1 features...people cannot wait forever. I hope we will not have to wait so long for the next ADQL versions :crossed_fingers: .

Anyway, as the first intent of this Issue is to not break existing ADQL-2.1 implementations and because @msdemlei said (about his suggested 4-th option):

unless we're worried about ADQL 2.1 implementations already deployed, that would actually be my favourite outcome

I agree with @mbtaylor : we should go for the 2nd Option (ugly for picky human eyes, but harmless for servers and clients, existing or not):

  1. Have all identifiers with the hyphen, except for geometrical features whose identifier will be ivo://ivoa.net/std/TAPRegExt#features-adqlgeo.

I am now going to prepare a Pull-Request fixing this IVOID.

I will also add into this Pull-Requests 2 other modifications:

  1. Add a comment about TAPRegExt#features-adqlgeo stating this is not a typo and this should stay as such for historical reasons.
  2. Add a comment about IVOID that, as they must be compared case-insensitively, servers MUST use all-lowercase and clients MAY compare without case normalisation (thank you @msdemlei for this suggestion)