terrajobst / nquery-vnext

A Roslyn inspired rewrite of NQuery
MIT License
72 stars 16 forks source link

`CASE WHEN` expression labels evaluated eagerly instead of lazily #65

Open dallmair opened 1 month ago

dallmair commented 1 month ago

Background

We have a table similar to:

Type (string) Value (string)
text abc
date 07/27/2023 00:00:00

We're trying to transform the date value to a different format using NQuery with:

SELECT  Type
        , CASE WHEN Type = 'date' THEN FORMAT(TO_DATETIME(Value), 'yyyy-MM-dd') ELSE Value END AS Value
FROM    ourTable

Expected result

Returns table with date value transformed to 2023-07-27.

Actual result

Throws exception:

System.FormatException: The string 'abc' was not recognized as a valid DateTime. There is an unknown word starting at index '0'.
   at System.Convert.ToDateTime(String value, IFormatProvider provider)
   at System.String.System.IConvertible.ToDateTime(IFormatProvider provider)
   at System.Convert.ToDateTime(Object value, IFormatProvider provider)
   at NQuery.Symbols.BuiltInFunctions.ToDateTime(Object value) in .\nquery-vnext\src\NQuery\Symbols\BuiltInFunctions.cs:line 176
   at lambda_method11(Closure )
   at NQuery.Iterators.ComputeScalarIterator.Read() in .\nquery-vnext\src\NQuery\Iterators\ComputeScalarIterator.cs:line 41
   at NQuery.Iterators.ProjectionIterator.Read() in .\nquery-vnext\src\NQuery\Iterators\ProjectionIterator.cs:line 31
   at NQuery.QueryReader.Read() in .\nquery-vnext\src\NQuery\QueryReader.cs:line 43
   at NQuery.Data.QueryReaderExtensions.ExecuteDataTable(QueryReader queryReader) in .\nquery-vnext\src\NQuery.Data\QueryReaderExtensions.cs:line 21
   at NQueryViewer.MainWindow.<>c__DisplayClass15_0.<ExecuteQuery>b__0() in .\nquery-vnext\src\NQueryViewer\MainWindow.xaml.cs:line 143
   at System.Threading.Tasks.Task`1.InnerInvoke()
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location ---
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
--- End of stack trace from previous location ---
   at NQueryViewer.MainWindow.ExecuteQuery() in .\nquery-vnext\src\NQueryViewer\MainWindow.xaml.cs:line 140

Analysis

The problem is caused by the caching of expression values in variables implemented in ExpressionBuilder.BuildCachedExpression. This code transforms expressions like a.b.c.Prop to

var temp1 = a.b;
var temp2 = temp1.c;
return temp2.Prop;

And it transforms our CASE WHEN Type = 'date' THEN FORMAT(TO_DATETIME(Value), 'yyyy-MM-dd') ELSE Value END to:

var temp1 = Type;
var temp2 = 'date';
var temp3 = Value;
var temp4 = TO_DATETIME(temp3);
var temp5 = 'yyyy-MM-dd';
return temp1 = temp2 ? FORMAT(temp4, temp5) : Value;

In words: It pulls calculations out of the CASE expression, changing the semantics (searched CASE expressions are short-circuited in SQL).

Open questions

As a little experiment, I removed caching by replacing all calls to BuildCachedExpression with BuildLiftedExpression. All unit tests are still green, but I'm unsure whether that is a good idea as I do not know:

  1. Why was the caching of expression values introduced?
  2. Does it have any measurable performance effect?
  3. Is it necessary for correctness in some cases I don't understand?
  4. Can caching be removed altogether? Or would this have negative effects, and it's necessary to disable it selectively to fix the CASE WHEN bug described here?

Highly appreciate any insights.

dallmair commented 1 month ago

Thinking about this some more, the existing caching is necessary for correctness due to NULL propagation semantics. For example, A() == B() must be compiled to:

var temp1 = A();
var temp2 = B();
return temp1 == null || temp2 == null ? null : temp1 == temp2;

Without caching, the evaluation would happen twice, which is at best undesired and at worst incorrect (when the function has side effects).

Thus, it makes sense to keep the general approach, and fix the problem in the compilation of CASE WHEN, because that is the only expression in SQL that has lazy-evaluation semantics.