oracle / dotnet-db-samples

.NET code samples for Oracle database developers #OracleDotNet
http://otn.oracle.com/dotnet
MIT License
413 stars 191 forks source link

String StartsWith() and EndsWith() unexpected parameter null check #223

Closed sccunningham7 closed 2 years ago

sccunningham7 commented 2 years ago

When executing a query such as:

var query = _context.MyTable.AsQueryable();

if (!string.IsNullOrEmpty(firstName))
   query = query.Where(x => x.FirstNme.StartsWith(firstName.ToUpper()));

var items = await query.ToListAsync();

I'm receiving the following generated SQL:

-- Region Parameters
-- :ToUpper_0='?' (Size = 90)
-- EndRegion
SELECT *
FROM "MY_SCHEMA"."MY_TABLE" "a"
WHERE ((:ToUpper_0 IS NULL ) OR ((("a"."FIRST_NME" IS NOT NULL) AND ((("a"."FIRST_NME" LIKE :ToUpper_0 || '%') AND (SUBSTR("a"."FIRST_NME", 1, LENGTH(:ToUpper_0)) = :ToUpper_0))))));

I would expect the following to be generated instead:

-- Region Parameters
-- :ToUpper_0='?' (Size = 90)
-- EndRegion
SELECT *
FROM "MY_SCHEMA"."MY_TABLE" "a"
WHERE (((("a"."FIRST_NME" IS NOT NULL) AND ((("a"."FIRST_NME" LIKE :ToUpper_0 || '%') AND (SUBSTR("a"."FIRST_NME", 1, LENGTH(:ToUpper_0)) = :ToUpper_0))))));

The lead (:ToUpper_0 IS NULL) OR in the upper statement causes indexes to be skipped. Is this intended behavior?

The execution cost of the upper command is 174x higher than the one below. UseRelationalNulls() in OracleDbContextOptionsBuilder does not affect the generated SQL. I'm not sure if this is an issue with EF Core itself either.

Using Microsoft.EntityFrameworkCore 3.1.18 Oracle.EntityFrameworkCore 3.19.120

Have also attempted with EF Core 5 and ODP 5, same results on either version.

alexkeh commented 2 years ago

Since ODP.NET and EF Core can't detect that your C# code is already checking for nulls, they won't know to eliminate the null check in the generated SQL.

You can use your own custom SQL directly with the FromSqlRaw method.

sccunningham7 commented 2 years ago

I'd rather avoid FromSqlRaw as writing any kind of raw untyped SQL is exactly what I'm trying to prevent by using LINQ to SQL. I have instead written something that generates my own expression trees, but this leads to the SQL not being parameterized which is not preferred (I'm probably doing something wrong here).

If I expand on the first example by setting one string to null and chain together multiple String StartsWith() queries, as I am doing in my actual code:

var query = _context.MyTable.AsQueryable();

string firstName = null;
string lastName = "DOE";

// ToUpper removed to avoid throwing null inside the lambda
query = query.Where(x => x.FirstNme.StartsWith(firstName));
query = query.Where(x => x.LastNme.StartsWith(lastName));

var items = await query.ToListAsync();

The Generated SQL is going to return nothing because I assume the parser sees the null variable

SELECT *
FROM "MY_SCHEMA"."MY_TABLE" "a"
WHERE 0 = 1;

Correct me if I'm wrong, but this null check seems to be applicable only if the string itself is empty but not null (and thus treated as null by Oracle). Is there no further way to optimize this to avoid this null check that is causing a performance issue? Or is writing either raw SQL or my own expressions the only real solution?

alexkeh commented 2 years ago

I'm not that knowledgeable in how to modify LINQ settings to generate better SQL in this particular case. I'll ask my team members if they have some ideas. Sometimes there are solutions and sometimes not. It's part of the tradeoff between productivity vs. run-time performance that app developers make with EF Core and LINQ. The generated SQL isn't always optimally performing.

sccunningham7 commented 2 years ago

I fully understand that the SQL generated will not always be optimal. Building queries dynamically works fine except for these string methods, so it would be a big pain to switch to raw SQL for only some parts of the logic.

The solution I've settled on is to use the EF.Functions.Like method in place of any StartsWith, EndsWith, or Contains string operations. So rewriting my first example:

var query = _context.MyTable.AsQueryable();

// input strings are sanitized for any wildcards or special characters, and escaped
if (!string.IsNullOrEmpty(firstName))
   query = query.Where(x => EF.Functions.Like(x.FirstNme, $"{firstName}%", "\\"));

var items = await query.ToListAsync();

Which yields:

-- Region Parameters
-- :Format_1='?' (Size = 90)
-- EndRegion
SELECT *
FROM "MY_SCHEMA"."MY_TABLE" "a"
WHERE "a"."FIRST_NME" LIKE :Format_1 ESCAPE '\';

Which uses the indexes and has much more acceptable performance. This is not ideal and the syntax isn't the cleanest, but it gets past the null check that is otherwise causing the issues I'm having.