iTwin / imodel-native

Apache License 2.0
21 stars 13 forks source link

ECSQL -> SQL conversion adds unnecessary clauses, causing poor query performance #912

Open grigasp opened 6 days ago

grigasp commented 6 days ago

I have the following ECSQL query:

SELECT count(1)
FROM bis.SpatialCategory c
WHERE EXISTS (
  SELECT 1
  FROM bis.GeometricElement3d element
  WHERE
    element.Model.Id = ?
    AND element.Category.Id = c.ECInstanceId
    AND element.Parent IS NULL
)

For some specific Models in my test iModels the query takes a lot of time to execute, e.g.:

iModel Model.Id Time it takes to execute query
A 0x1700000001c7 4.97 s.
B 0x240000000018 13.67 s.
C 0x20000000030 12.43 s.
D 0x20000001979 16.73 s.

When looking at why it's slow, I found that it includes a calculated ParentRelECClassId column into the WHERE clause, which makes the query slow:

SELECT COUNT(1)
FROM (
    SELECT [ElementId] ECInstanceId,
      [ECClassId]
    FROM [main].[bis_DefinitionElement]
    WHERE [bis_DefinitionElement].ECClassId = 169
  ) [c]
WHERE EXISTS(
    SELECT 1
    FROM (
        SELECT [bis_GeometricElement3d].[ElementId] ECInstanceId,
          [bis_GeometricElement3d].[ECClassId],
          [bis_Element].[ModelId],
          [bis_Element].[ParentId],
          (
            CASE
              WHEN [bis_Element].[ParentId] IS NULL THEN NULL
              ELSE [bis_Element].[ParentRelECClassId]
            END
          ) [ParentRelECClassId],
          [bis_GeometricElement3d].[CategoryId]
        FROM [main].[bis_GeometricElement3d]
          INNER JOIN [main].[bis_Element] ON [bis_GeometricElement3d].[ElementId] = [bis_Element].[Id]
      ) [element]
    WHERE [element].[ModelId] = ?
      AND [element].[CategoryId] = [c].[ECInstanceId]
      AND (
        [element].[ParentId] IS NULL
        AND [element].[ParentRelECClassId] IS NULL
      )
  )

As a workaround, I found that I can add .Id to the parent nullity check (... AND element.Parent.Id IS NULL), in which case the unnecessary clause in the SQL is not added:

SELECT COUNT(1)
FROM (
    SELECT [ElementId] ECInstanceId,
      [ECClassId]
    FROM [main].[bis_DefinitionElement]
    WHERE [bis_DefinitionElement].ECClassId = 169
  ) [c]
WHERE EXISTS(
    SELECT 1
    FROM (
        SELECT [bis_GeometricElement3d].[ElementId] ECInstanceId,
          [bis_GeometricElement3d].[ECClassId],
          [bis_Element].[ModelId],
          [bis_Element].[ParentId],
          [bis_GeometricElement3d].[CategoryId]
        FROM [main].[bis_GeometricElement3d]
          INNER JOIN [main].[bis_Element] ON [bis_GeometricElement3d].[ElementId] = [bis_Element].[Id]
      ) [element]
    WHERE [element].[ModelId] = ?
      AND [element].[CategoryId] = [c].[ECInstanceId]
      AND [element].[ParentId] IS NULL
  )

This change substantially improves query performance:

iModel Model.Id Before change After change
A 0x1700000001c7 4.97 s. 0.06 s.
B 0x240000000018 13.67 s. 0.01 s.
C 0x20000000030 12.43 s. 4.43 s.
D 0x20000001979 16.73 s. 10.13 s.

Ideally, I think ECDb should notice that ParentRelECClassId column is not used and omit it from converted SQL query altogether, or at least not include it in the WHERE clause.

Note: I was running the investigation on a local Windows machine using an optimized build of imodel console. However, I also confirmed this can be reproduced in the deployed imodel console (iModel B; query without .Id takes ~21 s.; query with .Id takes ~0.4 s.).

Please contact me personally if you need access to the test iModels.

khanaffan commented 5 days ago

For fix we should not select ParentRelECClassId if it was not used in anyway in query.