yv989c / BlazarTech.QueryableValues

This library allows you to efficiently compose an IEnumerable<T> in your Entity Framework Core queries when using the SQL Server database provider.
Other
91 stars 6 forks source link

Possibility of using implicit version of OpenJson? #38

Closed IanKemp closed 11 months ago

IanKemp commented 1 year ago

Currently the SQL generated by this library invokes OpenJson with explicit mappings, for example:

declare @p0 nvarchar(max) = '[{"X":0,"I":123},...,{"X:"99","I":987}]' // 100 integer values
declare @__p_1 int = 3000
declare @__p_2 int = 100

SELECT [r].[Id], [r].[Amount], [r.AccountId]
FROM [Transactions] AS [r]
WHERE EXISTS (
    SELECT 1
    FROM (
        SELECT TOP(@p1) [X], [I]
        FROM OPENJSON(@p0) WITH ([X] int, [I] int)
        ORDER BY [X]
    ) AS [b]
    WHERE [b].[I] = [r].[AccountId]) // AccountId is an integer
OFFSET @__p_1 ROWS FETCH NEXT @__p_2 ROWS ONLY

From my testing the above is slower than using the implicit mappings version:

declare @p0 nvarchar(max) = '[123,...,987]' // 100 integer values
declare @__p_1 int = 3000
declare @__p_2 int = 100

SELECT [r].[Id], [r].[Amount], [r.AccountId]
FROM [Transactions] AS [r]
WHERE EXISTS (
    SELECT 1
    FROM OPENJSON(@p0) AS [b]
    WHERE CONVERT(INT, [b].[value]) = [r].[AccountId])
OFFSET @__p_1 ROWS FETCH NEXT @__p_2 ROWS ONLY

because the former query incurs Sort and Top operations, versus only a Compute Scalar for the former. The latter also sends a smaller amount of data to the server in the JSON-serialised parameter.

What is the reason for using the explicit mappings version instead of implicit? And would you be willing to add functionality to control whether implicit or explicit is used?

yv989c commented 1 year ago

Hey @IanKemp , thanks for the feedback.

The OPENJSON_EXPLICIT approach currently used by this library should perform better because the query engine does not have to do any implicit casting at runtime, which is what happens when you use the implicit one you are suggesting. If you run your last query in SSMS and check its execution plan you should even see a warning about possible incorrect cardinality estimations. If you're curious, I had a detailed discussion about this with one member of the EF team here: https://github.com/dotnet/efcore/issues/13617#issuecomment-1556016939

In the previous link you can see how EF 8.0.0-preview.4.23259.3 –which implemented the approach you are suggesting– performed significantly worse than QueryableValues, which relies on the explicit approach.

Regarding the TOP and ORDER BY. The TOP is an optimization/attempt to help the query engine with a more accurate memory grant estimation. The ORDER BY does incurs a small performance hit but it's needed for correctness in other scenarios. Unfortunately because of the EF infrastructure I'm using to make this work, I can't remove it in your use case where it isn't relevant.

If you have evidence or some metrics that can help demonstrate the issue you are having, please share them here. You can also try using an older version of this library before I introduced the fix for ordering guarantees.

If this library gives you some value, please consider ⭐ the repo. It helps.🙂

yv989c commented 1 year ago

Oh btw. If possible for your use case, you can also try with a Join instead of Contains. It will typically give you better performance. You can find an example here.

IanKemp commented 1 year ago

Thanks @yv989c for both this package and your response. I've given you a ⭐ as requested, it really is the least that I can do.