ivoa-std / ADQL

Astronomical Data Query Language Standard
https://wiki.ivoa.net/twiki/bin/view/IVOA/ADQL
Creative Commons Attribution Share Alike 4.0 International
7 stars 7 forks source link

Fix `DISTANCE` description #28

Closed gmantele closed 4 years ago

gmantele commented 4 years ago

Markus Demleitner raised the point that the 2-arguments version of DISTANCE() allowed 2 general geometries instead of 2 points. Computing the distance between 2 points is easy but between 2 geometries requires an clear explanation of how such distance is computed.

To paraphrase Markus:

Being general here is a pain in the neck (actually, that's why I ran into this question). For one, you'll need to define distance much more carefully for such geometries, and if (as I think we ought to) we chose "minimum of distances of between all points in arg 1 and arg 2", I doubt we'll see many correct implementations of that. Also I'll want to map a lot of DISTANCE calls into contains(point, circle) statements (because that's much easier on the query planner), and that's a pain if one of the points could actually be, say, a polygon.

The issue here is about limiting the 2 arguments of DISTANCE() to be POINTs.

Complete email thread on http://mail.ivoa.net/pipermail/dal/2020-February/008268.html

gmantele commented 4 years ago

Pat Dowler make the remark that the function CENTROID() returns a POINT and can thus be used as valid argument of the 2-arguments DISTANCE() function:

[1] the prose is probably there because CENTROID returns a , eg DISTANCE(CENTROID(foo), ...)

Grégory Mantelet then proposed the following BNF for DISTANCE():

<distance> ::=
    DISTANCE(<coord_value>, <coord_value>)
  | DISTANCE(<numeric_value_expression>, <numeric_value_expression>,
             <numeric_value_expression>, <numeric_value_expression>)

<coord_value> ::= <point_value> | <column_reference>

<point_value> ::= <point> | <centroid>

See email: http://mail.ivoa.net/pipermail/dal/2020-February/008270.html

gmantele commented 4 years ago

Alberto Micol suggested to use the OGC standard's definition of the distance between 2 geometries:

Distance (anotherGeometry: Geometry):Double — Returns the shortest distance between any two Points in the two geometric objects as calculated in the spatial reference system of this geometric object. Because the geometries are closed, it is possible to find a point on each geometric object involved, such that the distance between these 2 points is the returned distance between their geometric objects.

He also proposed the following BNF:

DISTANCE <left_paren> <geometry_value_function> <comma> <geometry_value_function> <right_paren>

<geometry_value_function> ::=
        <centroid>
      | <circle>
      | <point>
      | <polygon>
      | <user_defined_function>
      | <union_of_geometries>

<union_of_geometries> ::=
        UNION <coord_sys> <left_paren>

        <geometry_value_function>

        { <geometry_value_function> } ?

            <right_paren>

With this BNF he suggested to add a syntax for union of geometries.

See http://mail.ivoa.net/pipermail/dal/2020-February/008271.html


Grégory Mantelet preferred to be a bit more careful and to postpone the introduction of geometries in DISTANCE() in a later version of ADQL:

(extract of an answer to Alberto):

(2) The definition already exists in the universally-adopted OGC standard: "the distance between two geometries is the shortest distance between any two points in those two geometries" (see: OpenGIS® Implementation Standard for Geographic information - Simple feature access - Part 1: Common architecture available at: http://portal.opengeospatial.org/files/?artifact_id=25355 )

It sounds interesting. It should definitely be something to think of for the future of ADQL. As I said, if we do so, we would probably have to review all other geometries to make everything consistent.

Besides, doing so will probably lead to another (annoying but unfortunately necessary) question: ADQL-2.2 or ADQL-3.0? (and I am not starting this discussion here)

(3) Many DBMSes already operationally use such definition (e.g., PostGIS, SQLServer, ORACLE).

...and some existing databases (and extensions such as PgSphere) behind TAP services would have to evolve to follow such standard...that may take (unfortunately) time.

(4) Adopting any different definition would only cause confusion to everybody.

...it would be confusing only to people already using such geometries in databases....which may not be the case for the majority of our users. But yes, I agree, it would be much better to follow an existing worldwide standard.

(5) Adopting existing standards can only speed up our VO work.

With that definition we would be fine and ready for the future (for some of the implementations), or ready for the present (for some other implementations).

With the above definition, and with the grammar that I already proposed in an earlier email, we are ready to proceed with no further delays to the publication of the ADQL2.1.

To conclude my thoughts, I would propose that the possibility to support the OGC standard should be postponed to the next version of ADQL (2.2 or 3.0). Do not think that I do not like the idea....it is just that I prefer an evolution of ADQL as smooth as possible, otherwise we risk to break some existing related services and we definitely do not want that.

See http://mail.ivoa.net/pipermail/dal/2020-February/008277.html


Alberto concluded on the following points:

A standard cannot be based on what pgsphere can or cannot do. Let’s not block the VO development because some (oldish) software component cannot do better.

My conclusions are:

  • The definition of distance between two geometries is well-defined and used world-wide
  • There is no reason to think of a new ADQL, version 2.2 or 3.0, ADQL2.1 can be achieved in May.
  • It is just only matter of allowing who can do more to do more.
  • If old implementations cannot change, well, they won’t. an error message will be shown; it is matter of documenting this in the ADQL2.1 standard.
  • I have already provided all is needed, including definition, and small grammar changes, for a speedy implementation of “distance” in ADQL2.1. Nothing else is needed.

See http://mail.ivoa.net/pipermail/dal/2020-March/008293.html


Markus concluded:

A standard cannot be based on what pgsphere can or cannot do.

...but a standard needs to be based on what is likely to be implemented. It's no good specifying behaviour that, in all likelihood, most services will not exhibit.

[...]

When we know that almost all services will raise an error, we still shouldn't promise it -- that's just bad user experience (where I give you we already have plenty of unexpected error messages because of incomplete or not-quite-complete implementations in the VO. But let's not wantonly create more).

And again, there's nothing wrong with ESO allowing a few more things than ADQL does. DaCHS has done that for years, and we even have mechanisms to say that there's custom features in a concrete service. Do that, and if it proves popular with your users, I'm sure someone will come up and implement it in pgsphere (and other components), too.

See http://mail.ivoa.net/pipermail/dal/2020-March/008295.html

gmantele commented 4 years ago

I am just realising that CENTROID() may not be the only function able to return a POINT. A UDF could return a geometry and so, a POINT too.

Knowing that, I think it is a good idea to also add <user_defined_function> to <point_value>.