resolve TBD on how to use interval type in ADQL

pdowler commented 3 years ago

At some point in the past. we discussed defining an "overlaps" function. The semantics are simple but finding a good name... not so much.

One possibility we discussed was overloading INTERSECTS with two arguments, but I don't know what impact that would have on grammar complexity.

I do not have any current use case to justify a "contains" function, but it is equally easy to define.

pdowler commented 3 years ago

We have implemented and make use of INTERSECTS(<interval>, <interval>) = 1 in several CADC TAP services. We support columns of type interval so generally this is used with one column ref and one constant defined with the INTERVAL function.

gmantele commented 3 years ago

I do not remember this discussion, but being now aware of it, this TBD makes sense for me. So, what was the result of this past discussion about interval overlaps?

gmantele commented 3 years ago

Personally, I think it is possible without negative impact on the grammar. Here are the modifications I am thinking of:

<interval_function> ::=
    <interval_contains> | <interval_intersects>

<interval_contains> ::=
    CONTAINS <left_paren> <interval_value_expression> <comma> <interval_value_expression> <right_paren>

<interval_intersects> ::=
    INTERSECTS <left_paren> <interval_value_expression> <comma> <interval_value_expression> <right_paren>

<interval_value_expression> ::=
    <column_reference>

<numeric_value_function> ::=
      <trig_function>
    | <math_function>
    | <interval_function>
    | <numeric_geometry_function >
    | <user_defined_function>

<interval_value_expression> is just limited to a column reference for the moment because there is currently no way to create an interval in ADQL. But since an interval is just composed of two time bounds, I think its constructor would be as simple as follows:

<interval_value_expression> ::=
      <column_reference>
    | INTERVAL <left_paren> <time_value_expression> <comma> <time_value_expression> <right_paren>

<time_value_expression> ::=
      <column_reference>
    | <timestamp>
    | <cast_function>

Note that the exact definition of <time_value_expression> would depend of the outcome of the issue #11 about constructor vs cast.

gmantele commented 3 years ago

Sorry for my last comment/proposal. I naively thought that interval notion was about time instead of just numeric values. Proof that I was not yet really aware about this new datatype in ADQL/DALI and that I answered a bit too quickly :confounded:

gmantele commented 3 years ago

With this fresh new year, I sense there is more thinking needed here. Here are my current personal thoughts on this topic:

Interval can easily be confused with time interval (and maybe other kinds of interval).

In PostgreSQL, MySQL and MS-SQLServer, the datatype INTERVAL is related to time ; it is generally the type used to express the difference between two dates/times.

Then, using the keyword INTERVAL for numeric values might be ambiguous later when we will need to use time interval (especially now that the time domain topic is raising in the VO).
Postgres has a datatype named "RANGE" which can be declined in function of the values to represent. There are for instance: int4range, numrange, tsrange (for timestamp), daterange, ...

I rather like this term (range) instead of interval....

...but I can not find anything similar for MySQL and MS-SQLServer, though I assume it can easily be reproduced by something like an array or a string (with a coma to separate interval/range bounds). If we go for this solution, we would first have to ensure that it works fine with MySQL and MS-SQLServer which do not support arrays.
INTERSECTS (and CONTAINS) are for the moment defined for a geometrical purpose.

We will have to make clear that they can apply to something else (which should not be a problem, I think).

Considering these, my current feeling is that we should wait for next version of ADQL (2.2 or 3.0).

But these are, of course, preliminary thoughts. Any more comments, opinions are welcome.

msdemlei commented 3 years ago

On Wed, Jan 13, 2021 at 12:55:41AM -0800, Grégory Mantelet wrote:

Then, using the keyword INTERVAL for numeric values might be ambiguous later when we will need to use time interval (especially now that the time domain topic is raising in the VO).

I agree here, but we've already an xtype interval in DALI, and I suspect people will be cross with us if we don't try hard to keep the terminology consistent at least within the VO.

However, as far as I can see, ADQL won't need any keyword for the type (unless we want to let people cast to intervals).

Postgres has a datatype named "RANGE" which can be declined in function of the values to represent. There are for instance: int4range, numrange, tsrange (for timestamp), daterange, ...

I rather like this term (range) instead of interval....

If we could start again, you'd have my vote. The way things are... hm. But as Postgres makes clear, "interval" isn't a type, "interval-of-type" is. So -- we could just use the rangeX types in casts if we wanted to, and we could add mappings like

int4range   -> datatype="integer" arraysize="2" xtype="interval"

in our type mapping table. Hm.

INTERSECTS (and CONTAINS) are for the moment defined for a geometrical purpose.

We will have to make clear that they can apply to something else (which should not be a problem, I think).

Agreed -- it's not ultra-nice that we introduce more items that need type introspection in the translator, but it's not the end of the world either.

Considering these, my current feeling is that we should wait for next version of ADQL (2.2 or 3.0).

Yes, definitely. We should have frozen ADQL 2.1 five years ago.

gmantele commented 3 years ago

I agree about the fact that it is now too late to change the terminology ; we will deal with interval as we can. Sorry, I thought about that but I forgot to write it...

The thing is that at some point we will have to deal with time interval and at that moment we will have to figure out another name/terminology/trick (e.g. use a prefix like time_)....anyway, it will probably be the topic of another discussion.

However, as far as I can see, ADQL won't need any keyword for the type (unless we want to let people cast to intervals).

But I fear it is going to happen. At some point, we may have to create intervals inside an ADQL query. CAST can be used in this intent, but probably a constructor as well (but that's another topic... #11 )...in anyway, the term interval will have to be explicitly used.

Agreed -- it's not ultra-nice that we introduce more items that need type introspection in the translator, but it's not the end of the world either.

Agreed.

Zarquan commented 3 years ago

Considering these, my current feeling is that we should wait for next version of ADQL (2.2 or 3.0).

Yes, definitely. We should have frozen ADQL 2.1 five years ago.

Absolutely, freeze it now, no new functionality, get ADQL 2.1 done and then we can move on to the PEG grammar. New functionality will be much easier to consider after we have moved to the PEG grammar.

ivoa-std / ADQL

resolve TBD on how to use interval type in ADQL #44