BulkUpdateAsync goes scanning unexpected index while proper exists

tivivi63 commented 4 weeks ago

Description

When dealing with unbalanced volumes in BulkUpdateAsync operation

1000 rows to be updated
2 columns each
against a 14 million rows table
that has a compound primary key (KeyCol1, KeyCol2),

generated TSQL command is like the following:

UPDATE DestinationTable
SET     [ValueCol1] = StagingTable.[ValueCol1], [ValueCol2] = StagingTable.[ValueCol2]
FROM [A] AS DestinationTable 
INNER JOIN (SELECT TOP 100 PERCENT * FROM #ZZZProjects_e37b28b2_f487_42f5_a820_c500b660b971
WHERE ZZZ_Index >= @IndexStart AND ZZZ_Index <= @IndexEnd ORDER BY ZZZ_Index) AS StagingTable ON DestinationTable.[KeyCol1] = StagingTable.[keyCol1] AND DestinationTable.[KeyCol2] = StagingTable.[KeyCol2]

Actually, execution plan is awfull because UPDATE goes scanning foreign key constraint index FK_REL_A_KeyCol1 (that only fits half the compound key), while it could obviously use according PK_A clustered index which exposes both required KeyCol1/KeyCol2. So 87% of query execution is wasted in such a wonderful mess.

Craziest thing is that the process knows about the clustered PK index because it is updating it in the end (the remaining 13%)!

Don't you think it'd be wiser to reverse the statement and set ZZZ as source table for UPDATE, while there would only be 1000 operations (and not 14 millions) and each of them would use PK_A clustered index SEEK and not a weird scan based on inaccurate foreign key one.

IMHO, your lib is no candidate at bulk update/merge querying against million/hundreds of million-rows tables. This is the painpoint of using EF and such extensions.

Further technical details

Database Server version: [SQL Server 2019]

JonathanMagnan commented 4 weeks ago

Hello @tivivi63 ,

Could you share your table creation (you can change the column name if needed, but make sure to include all of them), including your PK and Index/FK? You can send it in private here if needed: info@zzzprojects.com

We can surely re-create the scenario on our side without this info, but we want to make sure we have a similar setup as your. For example, just the number of columns/types might have a big impact on this specific problem.

It's indeed really weird that SQL Server chose to use an Index Scan instead of a PK, but there is some scenario that could explain this, which will indeed make it the most efficient resolution.

If you wish, you can already try on your side by forcing your PK for the index to use when joining with the option TableHintSql:

context.BulkUpdate(list, options =>
{
    options.TableHintSql = "WITH (INDEX(PK_EntitySimples))";
});

We usually do not have any problems on our side, even with tables containing millions of data.

Another solution you can also try if you want is to create a new index, which only contains both your key columns (KeyCol1, KeyCol2) to see if this one will be the preferred index taken by SQL Server over the FK index. I know your PK, which is probably your clustered index, already has exactly this, but the problem might be indeed that this is your clustered index (I will give a better explanation later if that already solves your performance issue).

Best Regards,

Jon

tivivi63 commented 3 weeks ago

Hello Jon,

Please find attached the table creation script you requested.

Regarding the TableHintSql option, is it available on both Core and legacy .Net frameworks? Actually I am database administrator but the dev people I'm working with are (for a few more months) split into "legacy" EF and EF "Core".

Hinting queries is something I can lean towards, as far as

it is drop-count usage and not generalized (especially index forcing)
there is actual knowledge and mastering of the database model and the potential consequences

Finally, may a "basic"

context.BulkUpdate(list, options =>
{
    options.TableHintSql = "WITH (FORCESEEK)";
});

be an affordable solution in our case?

Actually I'm no fond of EF-generated statements (especially those nasty lack-of-stats subqueries joins) at all, but if I may positively influence mates' queries towards a bit more performance even through hinting, I must apply.

Can you provide with reference technical documentation link regarding these Bulk statements I may study? Thanks,

JonathanMagnan commented 3 weeks ago

Hello @tivivi63 ,

I had some fun this morning with your script. As I initially expected, the partition plays a great role in why an index could be preferred to the PK

I created 3 tables (using your creation scripts):

A: A table with indexes and partitioned
B: A table without index (only the PK) and partitioned
C: A table with indexes but not partitioned

Each table has exactly the same rows (10 million).

When performing some basic SELECT directly with columns from the PK, the table C was always the fastest. Following very closely from the table A. The table B was by far the worst (not even close).

So, when a table is partitioned, SQL Server uses an index instead of a clustered key, which is way faster if it believe that's the best strategy.

I don't want to enter too much detail here because I could be wrong in my explanation, but the main reason is probably that the clustered index is split into multiple groups (saved in all different files in my case), while the index is only saved in 1 file. So getting the "matching" information from a single file is often the best solutions instead of looking at multiple different files.

Regarding the TableHintSql option, is it available on both Core and legacy .Net frameworks?

Yes, it's available in both

Finally, may a "basic" "WITH (FORCESEEK)" be an affordable solution in our case?

I'm not sure, but you can easily test it on your side and look at the performance

As said previously, perhaps SQL Server already uses the most optimized ways after all.

Actually I'm no fond of EF-generated statements

I'm not either ;)

Can you provide with reference technical documentation link regarding these Bulk statements I may study?

All the documentation can be found here: https://entityframework-extensions.net/bulk-extensions

However, I do not believe you will learn anything you are looking for.

Let me know if that answer all your questions.

Best Regards,

Jon

JonathanMagnan commented 2 weeks ago

Hello @tivivi63,

Since our last conversation, we haven't heard from you.

Let me know if you need more information.

Best regards,

Jon

zzzprojects / EntityFramework-Extensions

BulkUpdateAsync goes scanning unexpected index while proper exists #598

Description

Further technical details