prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
16.06k stars 5.38k forks source link

Greatest/Least Function Check NaN on REAL data types? #22391

Closed Real-Chen-Happy closed 7 months ago

Real-Chen-Happy commented 7 months ago

Hi, I am working on Greatest/Least function for PrestoSQL in Velox. In current PrestoDB implementations, Greatest/Least Function only checks NaN on DOUBLE data types.

@UsedByGeneratedCode
public static void checkNotNaN(String name, double value)
{
    if (Double.isNaN(value)) {
        throw new PrestoException(INVALID_FUNCTION_ARGUMENT, format("Invalid argument to %s(): NaN", name));
    }
}

https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/scalar/AbstractGreatestLeast.java#L124

if (type.getTypeSignature().getBase().equals(StandardTypes.DOUBLE)) {
   for (Parameter parameter : parameters) {
        body.append(parameter);
        body.append(invoke(binder.bind(CHECK_NOT_NAN.bindTo(getSignature().getNameSuffix())), "checkNotNaN"));
   }
}

https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/scalar/AbstractGreatestLeast.java#L160

Intuitively, might the same check also apply to REAL data type? I want to make sure that Greatest/Least function in Velox has the same behavior as the one in PrestoDB.

Thank you!

mbasmanova commented 7 months ago

CC: @rschlussel @tdcmeehan

rschlussel commented 7 months ago

We're changing the java presto function to follow our new nan definition, so we will be removing this check and treating NaN as greatest (I have a very much work in progress branch here: https://github.com/prestodb/presto/pull/22386/files). I was hoping to get this mostly done this week, but now I realized i'm oncall, so it may get delayed a week.

elharo commented 7 months ago

Definitely whatever you do for DOUBLE you should also do for REAL (float) types.

Real-Chen-Happy commented 7 months ago

We're changing the java presto function to follow our new nan definition, so we will be removing this check and treating NaN as greatest (I have a very much work in progress branch here: https://github.com/prestodb/presto/pull/22386/files). I was hoping to get this mostly done this week, but now I realized i'm oncall, so it may get delayed a week.

Just found this as the new definition of NaN under the issue https://github.com/prestodb/presto/issues/21936. I will be using it as the reference. https://github.com/prestodb/presto/wiki/Presto-NaN-behavior

edit: @rschlussel May I know if this doc is up-to-date? It seems like for greatest and least NaN will throw invalid argument error instead of treating it as the biggest.

rschlussel commented 7 months ago

Yes the new definition is the same as the one proposed for velox as well https://github.com/facebookincubator/velox/pull/7237. The new definition will define NaN=NaN as true and also consider NaN as largest for all ordering operations (>, max, sorting, etc.). That wiki page reflects the old behavior (which is still the current behavior) so we can track what's changing. It does not yet reflect the new behavior we are moving towards.