microsoft / fhir-server

A service that implements the FHIR standard
MIT License
1.2k stars 515 forks source link

Order of querystring parameters change the response time #2540

Closed holpit closed 7 months ago

holpit commented 2 years ago

Describe the bug The response time depends on parameters order

FHIR Version? R4

Data provider? SQL Server

To Reproduce Steps to reproduce the behavior: Make a GET to Fhir server with these paramaters takes 7 seconds (average)

http://localhost:5000/DocumentReference?_security=http://mycompany.com/CodeSystem/tenant|2&_count=10&_include=DocumentReference:subject&_include=DocumentReference:author&_include=DocumentReference:encounter&_sort=-date&patient:Patient.identifier=http://mycompany.com/CodeSystem/utenti|1012&status=current
  1. Make the same request but move the last parameter (status) on first position and the query takes 1.2 seconds (Average)

    http://localhost:5000/DocumentReference?status=current&_security=http://mycompany.com/CodeSystem/tenant|2&_count=10&_include=DocumentReference:subject&_include=DocumentReference:author&_include=DocumentReference:encounter&_sort=-date&patient:Patient.identifier=http://mycompany.com/CodeSystem/utenti|1012
  2. Same behaviour with _security parameter.

Expected behavior The response time is not influenced by the order of parameters

Actual behavior Reponse time are different

AB#118416

holpit commented 2 years ago

Nobody has the same issue?

brendankowitz commented 2 years ago

I'm wondering if it has something to do with the order that the SQL generator builds the CTEs, currently they "build/filter" on each other. Perhaps the first CTE/Expression create a faster or smaller resultset, this does sound like something to consider. Not specifically for this bug, but we're looking at refactoring the way the CTEs are joining together atm

brendankowitz commented 2 years ago

@SergeyGaluzo have you seen this type of behavior in any of your tests?

SergeyGaluzo commented 2 years ago

All our CTEs are joined together, SQL normalizes query to the base tables, so order of CTE should not matter. In some weird cases, because SQL optimizer spends limited time to consider all access permutations, "entry point" to optimization sequence could matter. This behavior is not reliable, and we should not count on it.

holpit commented 2 years ago

The query generated are different (obviously some parts identical) :

Status at the and:

-- Status at end
exec sp_executesql N'WITH cte0 AS
(
    SELECT ResourceSurrogateId AS Sid1
    FROM dbo.TokenSearchParam
    WHERE IsHistory = 0
        AND SearchParamId = @p0
        AND Code = @p1
        AND ResourceTypeId = @p2

),
cte1 AS
(
    SELECT ResourceSurrogateId AS Sid1
    FROM dbo.TokenSearchParam
    WHERE IsHistory = 0
        AND ResourceSurrogateId IN (SELECT Sid1 FROM cte0)
        AND SearchParamId = @p3
        AND SystemId IN (SELECT SystemId FROM dbo.System WHERE Value = @p4)
        AND Code = @p5

        AND ResourceTypeId = @p2

),
cte2 AS
(
    SELECT refSource.ResourceSurrogateId AS Sid1, refTarget.ResourceSurrogateId AS Sid2 
    FROM dbo.ReferenceSearchParam refSource
    INNER JOIN dbo.Resource refTarget
    ON refSource.ReferenceResourceTypeId = refTarget.ResourceTypeId
        AND refSource.ReferenceResourceId = refTarget.ResourceId
    WHERE refSource.SearchParamId = @p6
        AND refTarget.IsHistory = 0
        AND refSource.IsHistory = 0
        AND refSource.ResourceTypeId IN (@p2)
        AND refSource.ReferenceResourceTypeId IN (@p7)
        AND refSource.ResourceSurrogateId IN (SELECT Sid1 FROM cte1)
        AND refSource.ResourceTypeId = @p2
),
cte3 AS
(
    SELECT Sid1, ResourceSurrogateId AS Sid2
    FROM dbo.TokenSearchParam
    INNER JOIN cte2
    ON ResourceSurrogateId = Sid2
    WHERE IsHistory = 0
        AND SearchParamId = @p8
        AND SystemId IN (SELECT SystemId FROM dbo.System WHERE Value = @p9)
        AND Code = @p10

),
cte4 AS
(
    SELECT ResourceSurrogateId AS Sid1, StartDateTime as SortValue
    FROM dbo.DateTimeSearchParam
    WHERE IsHistory = 0
        AND SearchParamId = @p11

        AND ResourceTypeId = @p2

        AND ResourceSurrogateId IN (SELECT Sid1 FROM cte3)
),
cte5 AS
(
    SELECT DISTINCT TOP (@p12) Sid1, 1 AS IsMatch, 0 AS IsPartial , cte4.SortValue
    FROM cte4
    ORDER BY cte4.SortValue DESC,  Sid1 ASC
),
cte6 AS
(
    SELECT DISTINCT refTarget.ResourceSurrogateId AS Sid1, 0 AS IsMatch 
    FROM dbo.ReferenceSearchParam refSource
    INNER JOIN dbo.Resource refTarget
    ON refSource.ReferenceResourceTypeId = refTarget.ResourceTypeId
        AND refSource.ReferenceResourceId = refTarget.ResourceId
    WHERE refSource.SearchParamId = @p13
        AND refTarget.IsHistory = 0
        AND refSource.IsHistory = 0
        AND refSource.ResourceTypeId IN (42)
        AND refSource.ResourceSurrogateId IN (SELECT TOP(@p14) Sid1 FROM cte5)
),
cte7 AS
(
    SELECT DISTINCT Sid1, IsMatch, 0 AS IsPartial 
    FROM cte6
),
cte8 AS
(
    SELECT DISTINCT refTarget.ResourceSurrogateId AS Sid1, 0 AS IsMatch 
    FROM dbo.ReferenceSearchParam refSource
    INNER JOIN dbo.Resource refTarget
    ON refSource.ReferenceResourceTypeId = refTarget.ResourceTypeId
        AND refSource.ReferenceResourceId = refTarget.ResourceId
    WHERE refSource.SearchParamId = @p15
        AND refTarget.IsHistory = 0
        AND refSource.IsHistory = 0
        AND refSource.ResourceTypeId IN (42)
        AND refSource.ResourceSurrogateId IN (SELECT TOP(@p16) Sid1 FROM cte5)
),
cte9 AS
(
    SELECT DISTINCT Sid1, IsMatch, 0 AS IsPartial 
    FROM cte8
),
cte10 AS
(
    SELECT DISTINCT refTarget.ResourceSurrogateId AS Sid1, 0 AS IsMatch 
    FROM dbo.ReferenceSearchParam refSource
    INNER JOIN dbo.Resource refTarget
    ON refSource.ReferenceResourceTypeId = refTarget.ResourceTypeId
        AND refSource.ReferenceResourceId = refTarget.ResourceId
    WHERE refSource.SearchParamId = @p17
        AND refTarget.IsHistory = 0
        AND refSource.IsHistory = 0
        AND refSource.ResourceTypeId IN (42)
        AND refSource.ResourceSurrogateId IN (SELECT TOP(@p18) Sid1 FROM cte5)
),
cte11 AS
(
    SELECT DISTINCT Sid1, IsMatch, 0 AS IsPartial 
    FROM cte10
),
cte12 AS
(
    SELECT Sid1, IsMatch, IsPartial , SortValue
    FROM cte5
    UNION ALL
    SELECT Sid1, IsMatch, IsPartial, NULL as SortValue 
    FROM cte7
    UNION ALL
    SELECT Sid1, IsMatch, IsPartial, NULL as SortValue 
    FROM cte9
    UNION ALL
    SELECT Sid1, IsMatch, IsPartial, NULL as SortValue 
    FROM cte11
)
SELECT DISTINCT r.ResourceTypeId, r.ResourceId, r.Version, r.IsDeleted, r.ResourceSurrogateId, r.RequestMethod, CAST(IsMatch AS bit) AS IsMatch, CAST(IsPartial AS bit) AS IsPartial, r.IsRawResourceMetaSet, r.RawResource, cte12.SortValue
FROM dbo.Resource r

INNER JOIN cte12
ON r.ResourceSurrogateId = cte12.Sid1
ORDER BY cte12.SortValue DESC, r.ResourceSurrogateId ASC 
OPTION(RECOMPILE)',N'@p0 smallint,@p1 varchar(128),@p2 smallint,@p3 smallint,@p4 nvarchar(256),@p5 varchar(128),@p6 smallint,@p7 smallint,@p8 smallint,@p9 nvarchar(256),@p10 varchar(128),@p11 smallint,@p12 int,@p13 smallint,@p14 int,@p15 smallint,@p16 int,@p17 smallint,@p18 int',@p0=417,@p1='current',@p2=42,@p3=1115,@p4=N'http://mycompany.com/CodeSystem/tenant',@p5='2',@p6=200,@p7=103,@p8=936,@p9=N'http://mycompany.com/CodeSystem/utenti',@p10='1012',@p11=404,@p12=11,@p13=418,@p14=10,@p15=400,@p16=10,@p17=187,@p18=10

Status first

-- Status start
exec sp_executesql N'WITH cte0 AS
(
    SELECT ResourceSurrogateId AS Sid1
    FROM dbo.TokenSearchParam
    WHERE IsHistory = 0
        AND SearchParamId = @p0
        AND SystemId IN (SELECT SystemId FROM dbo.System WHERE Value = @p1)
        AND Code = @p2

        AND ResourceTypeId = @p3

),
cte1 AS
(
    SELECT refSource.ResourceSurrogateId AS Sid1, refTarget.ResourceSurrogateId AS Sid2 
    FROM dbo.ReferenceSearchParam refSource
    INNER JOIN dbo.Resource refTarget
    ON refSource.ReferenceResourceTypeId = refTarget.ResourceTypeId
        AND refSource.ReferenceResourceId = refTarget.ResourceId
    WHERE refSource.SearchParamId = @p4
        AND refTarget.IsHistory = 0
        AND refSource.IsHistory = 0
        AND refSource.ResourceTypeId IN (@p3)
        AND refSource.ReferenceResourceTypeId IN (@p5)
        AND refSource.ResourceSurrogateId IN (SELECT Sid1 FROM cte0)
        AND refSource.ResourceTypeId = @p3
),
cte2 AS
(
    SELECT Sid1, ResourceSurrogateId AS Sid2
    FROM dbo.TokenSearchParam
    INNER JOIN cte1
    ON ResourceSurrogateId = Sid2
    WHERE IsHistory = 0
        AND SearchParamId = @p6
        AND SystemId IN (SELECT SystemId FROM dbo.System WHERE Value = @p7)
        AND Code = @p8

),
cte3 AS
(
    SELECT ResourceSurrogateId AS Sid1
    FROM dbo.TokenSearchParam
    WHERE IsHistory = 0
        AND ResourceSurrogateId IN (SELECT Sid1 FROM cte2)
        AND SearchParamId = @p9
        AND Code = @p10
        AND ResourceTypeId = @p3

),
cte4 AS
(
    SELECT ResourceSurrogateId AS Sid1, StartDateTime as SortValue
    FROM dbo.DateTimeSearchParam
    WHERE IsHistory = 0
        AND SearchParamId = @p11

        AND ResourceTypeId = @p3

        AND ResourceSurrogateId IN (SELECT Sid1 FROM cte3)
),
cte5 AS
(
    SELECT DISTINCT TOP (@p12) Sid1, 1 AS IsMatch, 0 AS IsPartial , cte4.SortValue
    FROM cte4
    ORDER BY cte4.SortValue DESC,  Sid1 ASC
),
cte6 AS
(
    SELECT DISTINCT refTarget.ResourceSurrogateId AS Sid1, 0 AS IsMatch 
    FROM dbo.ReferenceSearchParam refSource
    INNER JOIN dbo.Resource refTarget
    ON refSource.ReferenceResourceTypeId = refTarget.ResourceTypeId
        AND refSource.ReferenceResourceId = refTarget.ResourceId
    WHERE refSource.SearchParamId = @p13
        AND refTarget.IsHistory = 0
        AND refSource.IsHistory = 0
        AND refSource.ResourceTypeId IN (42)
        AND refSource.ResourceSurrogateId IN (SELECT TOP(@p14) Sid1 FROM cte5)
),
cte7 AS
(
    SELECT DISTINCT Sid1, IsMatch, 0 AS IsPartial 
    FROM cte6
),
cte8 AS
(
    SELECT DISTINCT refTarget.ResourceSurrogateId AS Sid1, 0 AS IsMatch 
    FROM dbo.ReferenceSearchParam refSource
    INNER JOIN dbo.Resource refTarget
    ON refSource.ReferenceResourceTypeId = refTarget.ResourceTypeId
        AND refSource.ReferenceResourceId = refTarget.ResourceId
    WHERE refSource.SearchParamId = @p15
        AND refTarget.IsHistory = 0
        AND refSource.IsHistory = 0
        AND refSource.ResourceTypeId IN (42)
        AND refSource.ResourceSurrogateId IN (SELECT TOP(@p16) Sid1 FROM cte5)
),
cte9 AS
(
    SELECT DISTINCT Sid1, IsMatch, 0 AS IsPartial 
    FROM cte8
),
cte10 AS
(
    SELECT DISTINCT refTarget.ResourceSurrogateId AS Sid1, 0 AS IsMatch 
    FROM dbo.ReferenceSearchParam refSource
    INNER JOIN dbo.Resource refTarget
    ON refSource.ReferenceResourceTypeId = refTarget.ResourceTypeId
        AND refSource.ReferenceResourceId = refTarget.ResourceId
    WHERE refSource.SearchParamId = @p17
        AND refTarget.IsHistory = 0
        AND refSource.IsHistory = 0
        AND refSource.ResourceTypeId IN (42)
        AND refSource.ResourceSurrogateId IN (SELECT TOP(@p18) Sid1 FROM cte5)
),
cte11 AS
(
    SELECT DISTINCT Sid1, IsMatch, 0 AS IsPartial 
    FROM cte10
),
cte12 AS
(
    SELECT Sid1, IsMatch, IsPartial , SortValue
    FROM cte5
    UNION ALL
    SELECT Sid1, IsMatch, IsPartial, NULL as SortValue 
    FROM cte7
    UNION ALL
    SELECT Sid1, IsMatch, IsPartial, NULL as SortValue 
    FROM cte9
    UNION ALL
    SELECT Sid1, IsMatch, IsPartial, NULL as SortValue 
    FROM cte11
)
SELECT DISTINCT r.ResourceTypeId, r.ResourceId, r.Version, r.IsDeleted, r.ResourceSurrogateId, r.RequestMethod, CAST(IsMatch AS bit) AS IsMatch, CAST(IsPartial AS bit) AS IsPartial, r.IsRawResourceMetaSet, r.RawResource, cte12.SortValue
FROM dbo.Resource r

INNER JOIN cte12
ON r.ResourceSurrogateId = cte12.Sid1
ORDER BY cte12.SortValue DESC, r.ResourceSurrogateId ASC 
OPTION(RECOMPILE)',N'@p0 smallint,@p1 nvarchar(256),@p2 varchar(128),@p3 smallint,@p4 smallint,@p5 smallint,@p6 smallint,@p7 nvarchar(256),@p8 varchar(128),@p9 smallint,@p10 varchar(128),@p11 smallint,@p12 int,@p13 smallint,@p14 int,@p15 smallint,@p16 int,@p17 smallint,@p18 int',@p0=1115,@p1=N'http://mycompany.com/CodeSystem/tenant',@p2='2',@p3=42,@p4=200,@p5=103,@p6=936,@p7=N'http://mycompany.com/CodeSystem/utenti',@p8='1012',@p9=417,@p10='current',@p11=404,@p12=11,@p13=418,@p14=10,@p15=400,@p16=10,@p17=187,@p18=10
SergeyGaluzo commented 2 years ago

Not sure that I understand. Let me clarify. Consider SELECT FROM T1 JOIN T2 ON T2.X = T1.X WHERE T1.Y = @y AND T2.Z = @z and SELECT FROM T2 JOIN T1 ON T1.X = T2.X WHERE T2.Z = @z AND T1.Y = @y Are these queries different? - Yes. Should SQL optimizer come with different execution plans? - No.

holpit commented 2 years ago

I agree with you but what I would to point out is that the dynamic query produced by Fhir should not depends on the parameters order. Let me explain with an example: Questionnarie?status=active&subject=123 Questionnaire?subject=123&status=active Why these requests produce a different query? Sure, the execution plan matter, but the fact is that are quite different query.

SergeyGaluzo commented 2 years ago

Do these queries produce different results if run without TOP clause? Or just order of CTEs is different, but results are the same (no TOP)?

holpit commented 2 years ago

The results are the same with or without TOP (_count) clause. The queries are yet different.

SergeyGaluzo commented 2 years ago

Is difference only in the order of CTEs?

holpit commented 2 years ago

Look at my previous post. Some cte is identical, some identical but in different position and some completely different.

SergeyGaluzo commented 2 years ago

I did tedious work of substituting parameters in both queries above. It appeared that queries are different only in the order of first 4 CTEs (from cte0 to cte3). Starting with cte4 and to cte12 they are identical. Please see below the result of this comparison. If I missed something, please let me know.

I expect that order of CTEs reflects order of input parameters. What is not desired is query plan generated in some cases.

Let me restate that we cannot rely on order of tables (driven from order of parameters) to be reflected in execution order.

What we are dealing here is an inefficient approach to represent filtering on M:M relationships. What we do now is JOIN filters (this produces dups in record sets) and DISTINCT results in last step (this "fixes" duplication). What we should consider is using EXISTS logic to avoid duplication in the first place. This should express our desire more clearly and should produce much less confusion for SQL optimizer.

WITH cte0 AS -- cte1 ( SELECT ResourceSurrogateId AS Sid1 FROM dbo.TokenSearchParam WHERE IsHistory = 0 AND SearchParamId = 1115 AND SystemId IN (SELECT SystemId FROM dbo.System WHERE Value = 'http://mycompany.com/CodeSystem/tenant') AND Code = '2' AND ResourceTypeId = 42 ), cte1 AS -- cte2 ( SELECT refSource.ResourceSurrogateId AS Sid1, refTarget.ResourceSurrogateId AS Sid2 FROM dbo.ReferenceSearchParam refSource INNER JOIN dbo.Resource refTarget ON refSource.ReferenceResourceTypeId = refTarget.ResourceTypeId AND refSource.ReferenceResourceId = refTarget.ResourceId WHERE refSource.SearchParamId = 200 AND refTarget.IsHistory = 0 AND refSource.IsHistory = 0 AND refSource.ReferenceResourceTypeId = 103 AND refSource.ResourceSurrogateId IN (SELECT Sid1 FROM cte0) AND refSource.ResourceTypeId = 42 ), cte2 AS -- cte3 ( SELECT Sid1, ResourceSurrogateId AS Sid2 FROM dbo.TokenSearchParam INNER JOIN cte1 ON ResourceSurrogateId = Sid2 WHERE IsHistory = 0 AND SearchParamId = 936 AND SystemId IN (SELECT SystemId FROM dbo.System WHERE Value = 'http://mycompany.com/CodeSystem/utenti') AND Code = '1012' ), cte3 AS -- cte0 ( SELECT ResourceSurrogateId AS Sid1 FROM dbo.TokenSearchParam WHERE IsHistory = 0 AND ResourceSurrogateId IN (SELECT Sid1 FROM cte2) AND SearchParamId = 417 AND Code = 'current' AND ResourceTypeId = 42 ), cte4 AS -- cte4 ( SELECT ResourceSurrogateId AS Sid1, StartDateTime as SortValue FROM dbo.DateTimeSearchParam WHERE IsHistory = 0 AND SearchParamId = 404 AND ResourceTypeId = 42 AND ResourceSurrogateId IN (SELECT Sid1 FROM cte3) ), cte5 AS -- cte5 ( SELECT DISTINCT TOP (11) Sid1, 1 AS IsMatch, 0 AS IsPartial , cte4.SortValue FROM cte4 ORDER BY cte4.SortValue DESC, Sid1 ASC ), cte6 AS -- cte6 ( SELECT DISTINCT refTarget.ResourceSurrogateId AS Sid1, 0 AS IsMatch FROM dbo.ReferenceSearchParam refSource INNER JOIN dbo.Resource refTarget ON refSource.ReferenceResourceTypeId = refTarget.ResourceTypeId AND refSource.ReferenceResourceId = refTarget.ResourceId WHERE refSource.SearchParamId = 418 AND refTarget.IsHistory = 0 AND refSource.IsHistory = 0 AND refSource.ResourceTypeId IN (42) AND refSource.ResourceSurrogateId IN (SELECT TOP(10) Sid1 FROM cte5) ), cte7 AS -- cte7 ( SELECT DISTINCT Sid1, IsMatch, 0 AS IsPartial FROM cte6 ), cte8 AS -- cte8 ( SELECT DISTINCT refTarget.ResourceSurrogateId AS Sid1, 0 AS IsMatch FROM dbo.ReferenceSearchParam refSource INNER JOIN dbo.Resource refTarget ON refSource.ReferenceResourceTypeId = refTarget.ResourceTypeId AND refSource.ReferenceResourceId = refTarget.ResourceId WHERE refSource.SearchParamId = 400 AND refTarget.IsHistory = 0 AND refSource.IsHistory = 0 AND refSource.ResourceTypeId IN (42) AND refSource.ResourceSurrogateId IN (SELECT TOP(10) Sid1 FROM cte5) ), cte9 AS -- cte9 ( SELECT DISTINCT Sid1, IsMatch, 0 AS IsPartial FROM cte8 ), cte10 AS -- cte10 ( SELECT DISTINCT refTarget.ResourceSurrogateId AS Sid1, 0 AS IsMatch FROM dbo.ReferenceSearchParam refSource INNER JOIN dbo.Resource refTarget ON refSource.ReferenceResourceTypeId = refTarget.ResourceTypeId AND refSource.ReferenceResourceId = refTarget.ResourceId WHERE refSource.SearchParamId = 187 AND refTarget.IsHistory = 0 AND refSource.IsHistory = 0 AND refSource.ResourceTypeId IN (42) AND refSource.ResourceSurrogateId IN (SELECT TOP(10) Sid1 FROM cte5) ), cte11 AS -- cte11 ( SELECT DISTINCT Sid1, IsMatch, 0 AS IsPartial FROM cte10 ), cte12 AS -- cte12 ( SELECT Sid1, IsMatch, IsPartial , SortValue FROM cte5 UNION ALL SELECT Sid1, IsMatch, IsPartial, NULL as SortValue FROM cte7 UNION ALL SELECT Sid1, IsMatch, IsPartial, NULL as SortValue FROM cte9 UNION ALL SELECT Sid1, IsMatch, IsPartial, NULL as SortValue FROM cte11 ) SELECT DISTINCT r.ResourceTypeId, r.ResourceId, r.Version, r.IsDeleted, r.ResourceSurrogateId, r.RequestMethod, CAST(IsMatch AS bit) AS IsMatch, CAST(IsPartial AS bit) AS IsPartial, r.IsRawResourceMetaSet, r.RawResource, cte12.SortValue FROM dbo.Resource r INNER JOIN cte12 ON r.ResourceSurrogateId = cte12.Sid1 ORDER BY cte12.SortValue DESC, r.ResourceSurrogateId ASC OPTION(RECOMPILE)

holpit commented 2 years ago

Thank you very much @SergeyGaluzo for your deep check. I think your suggestion could fix the issue. Look forward to see this implementation.

mikaelweave commented 2 years ago

AB#90261 / AB#89919

brendankowitz commented 2 years ago

Thanks @SergeyGaluzo. @fhibf I believe what you are working on will cover this

EXPEkesheth commented 7 months ago

@holpit - The reported issue is fixed. Please reach out for any additional questions.