Closed gggrace14 closed 4 days ago
cc @mbasmanova @kaikalur
CC: @aditi-pandit @tdcmeehan @majetideepak
CC: @amitkdutta
Velox currently allows leading and trailing whitespace in cast(varchar as
Interesting! I am just curious what do other systems like MySQL, duckdb etc do here ? This should be easily fixable in presto Java without breaking any client assumptions.
I believe this case should be easily fixable by making Presto's behavior compiling with ANSI SQL.
I doubt any client/user would expect and depend on a throw for varchar -> number conversion because of some spaces (although this is not impossible, plenty of weird cases out there).
ANSI SQL says it very clear, thanks @gggrace14 for digging this out!
DubkDB should ignore whitespaces too - Velox uses it in the unit tests.
Related to #21001 (another blank padding bug)
This works fine in Postgres. My guess is this is because in Postgres, it's not a conversion from varchar to bigint, but char(3) to bigint. The extra space is treated as blank padding, Postgres is consistent in ignoring the blank padding for comparisons, and also apparently when converting between types.
Even this query succeeds, and I suspect it's because there's an implicit coercion from varchar to char for the sake of the cast: SELECT CAST (CAST('100 ' AS VARCHAR(4)) AS INTEGER);
In Presto, it's actually a conversion from varchar to bigint. This causes an issue here, but also in the issue above. For varchar, I don't think blank padding semantics apply.
FYI, there will be result differences between Presto and Velox for try_cast, e.g. NULL in Presto vs. 12 in Velox:
presto:di> select try_cast(' 12' as integer);
_col0
-------
NULL
(1 row)
FYI, there will be result differences between Presto and Velox for try_cast, e.g. NULL in Presto vs. 12 in Velox:
presto:di> select try_cast(' 12' as integer); _col0 ------- NULL (1 row)
Yes, this is true. I should've also included in the description.
So a common case we can predict is the result will have more records if the user has filter like try_cast() IS NOT NULL.
If we want to proceed with the alignment, we should make it into a major Presto release.
Note, that Presto's behavior is also inconsistent. It ignores whitespaces for floating point numbers conversion:
SELECT CAST(' 6.0 ' AS REAL)
_col0
-------
6.0
But throws an error for integers. Only because Java itself has this behavior, rather than the conscious choice of a query engine. Presto Java uses Long.parseLong, which behaves like this:
Parses the string argument as a signed decimal long. The characters in the string must all be decimal digits, except that the first character may be an ASCII minus sign '-' (\u002D') to indicate a negative value or an ASCII plus sign '+' ('\u002B') to indicate a positive value. The resulting long value is returned, exactly as if the argument and the radix 10 were given as arguments to the parseLong(java.lang.String, int) method. Note that neither the character L ('\u004C') nor l ('\u006C') is permitted to appear at the end of the string as a type indicator, as would be permitted in Java programming language source code.
Submitting https://github.com/prestodb/presto/pull/22284 to change the behavior.
Current behavior of casting varchar to integral types is to throw exception if the varchar is valid but contains leading or trailing whitespaces. For example,
throws the following exception
This is different from the behavior described in ANSI SQL
Accordingly, TRY(CAST(varchar AS) and TRY_CAST(varchar AS ) return NULL rather than the actual valid integral number.
Should Presto change the above CAST behavior to accept and trim leading and trailing whitespaces and be compliant with ANSI SQL?
The exception is thrown because the implementation of casting varchar to integral types directly delegates to Java, ex. https://github.com/prestodb/presto/blob/04138bc26a4fc801630fe936463d89d071d00c0a/presto-main/src/main/java/com/facebook/presto/type/VarcharOperators.java#L176-L187 And Java does not accept whitespaces or any character other than decimal digits and sign character. https://docs.oracle.com/javase/8/docs/api/java/lang/Long.html#parseLong-java.lang.String-
Your Environment
Expected Behavior
Casting of varchar trims leading and trailing whitespaces.
Current Behavior
Casting of varchar to integral types throws exception if the varchar contains leading or trailing whitespaces
Possible Solution
Casting of varchar trims leading and trailing whitespaces before delegating to Java
Long.parseLong()
.Steps to Reproduce
Screenshots (if appropriate)
Context
Throwing exception makes Presto have a different semantics of above CAST from ANSI SQL and other mainstream engines or libraries that compliant with ANSI SQL, ex., MySQL, folly, etc.