Open marcperrinoptel opened 2 years ago
Thanks for raising this issue and giving a detailed explanation. We are discussing this issue internally.
We agree with what you suggested and are planning on doing some investigations. But if nothing comes from the investigation we may just improve on our current fix to at least fix the issues you raised, as we don't encourage people to use too many parameters.
Yeah, as far as I know, the main scenario for which that mechanism was designed (prefetch) does not trigger those problems (concurrency could still be an issue though? in case insertion into + read from the temp table is not atomic); if you do run into those problems, you probably could have done better anyway (e.g. subquery instead of evaluation). Still, it's a gap with other backends / interesting to investigate.
This issue looks very ugly. Can we create a different temp table every time we are called? Something like:
def mssql_split_parameter_list_as_sql(self, compiler, connection):
# Insert In clause parameters 1000 at a time into a temp table.
lhs, _ = self.process_lhs(compiler, connection)
_, rhs_params = self.batch_process_rhs(compiler, connection)
table_name = "#Temp_params" + uuid.uuid4().hex
with connection.cursor() as cursor:
parameter_data_type = self.lhs.field.db_type(connection)
Temp_table_collation = 'COLLATE DATABASE_DEFAULT' if 'char' in parameter_data_type else ''
cursor.execute(f"CREATE TABLE {table_name} (params {parameter_data_type} {Temp_table_collation})")
for offset in range(0, len(rhs_params), 1000):
sqls_params = rhs_params[offset: offset + 1000]
cursor.execute(f"INSERT INTO {table_name} VALUES " + ",".join("(%s)" for _ in sqls_params), sqls_params)
in_clause = lhs + ' IN ' + f'(SELECT params from {table_name})'
print(in_clause)
return in_clause, ()
Aren't those tables dropped automatically?
@jmah8
Could we use table value parameters to handle this?
If I have a long list of ints, we can create a table type:
CREATE TYPE django_mssql_int_list AS TABLE
(
[Id] INT NOT NULL
);
and then our query would be
in_clause = lhs + ' IN ' + '(SELECT id from %s)'
# I'm not sure if this is how you pass a custom type table value
return in_clause, (tuple(["django_mssql_int_list"] + [[x] for x in rhs_params]),)
@federicoemartinez
I haven't worked on this project for a long time. I recommend checking with someone currently working on the project as they would be a better person to ask.
We just independently discovered this (query with two different long in filters applying the same param list to both). IMO this is a really serious problem and quietly doing the wrong thing is much worse than exploding due to the 2100 limit. (One of those in lists could very well be a row permissions check, for example...)
Maybe having randomly-named temp tables is a plausible hack, but honestly the whole #Temp_params mechanism feels pretty doomed to me. It generally feels pretty bad to be doing database access inside as_sql
, so that we're hitting the server every time we prepare the query string (even just for logging purposes etc).
split_parameter_list_as_sql
was intended to solve a different, IN-specific problem in another DBMS, not as a general workaround for parameter limits. Limits also affect other DBMSes and are not worked around in their backends; I think if this is to be tackled it should be done consistently for all backends by Django, like it was for eg bulk_create
.
I haven't investigated the performance implication, but I've noticed that Entity Framework uses OPENJSON
subqueries to pass parameters list, similarly to what is described there: https://stackoverflow.com/a/60705570
Following that approach, the query from the first post could be made to use only 2 parameters and look something like:
SELECT (...)
FROM (...)
WHERE (
[a] IN (
SELECT [f0].[value]
FROM OPENJSON(@param1) WITH ([value] int '$') AS [f0])
AND
[b] IN (
SELECT [f1].[value]
FROM OPENJSON(@param2) WITH ([value] int '$') AS [f1]))
)
That's an interesting approach that seems less immediately doomed.
There'd obviously be some restrictions on what the types of the parameters can be though if they have to go through JSON.
(Also it's only available from SQL Server 2016.)
That SQL Server limitation is painful indeed; the current code to work around it involves temporary table
#Temp_params
,max_in_list_size
,split_parameter_list_as_sql
, but breaks very easily with simple test cases.Here are 2 examples:
my_model .filter(a__in=range_a) .filter(b__in=range_b)
SELECT (...) FROM (...) WHERE ( [a] IN (SELECT params from #Temp_params) AND [b] IN (SELECT params from #Temp_params) )
range_a = list(range(1100)) range_b = list(range(2000, 3100))