prisma / tiberius

TDS 7.2+ (Microsoft SQL Server) driver for Rust
Apache License 2.0
326 stars 119 forks source link

Impossible to do an "IN" prepared statement query? #157

Open servonlewis opened 3 years ago

servonlewis commented 3 years ago

hey guys, very simple, but how can we achieve something like this? It looks like the (@P1) is not accepted, and i do not know how to make it spit out the query after it is generated.

//Would like to use a prepared statement like this
pub const QUERY_LIST_CMDB_CI_SERVERS: &str = r#"
    SELECT
    cmdb.sys_id,
    cmdb.name,
    cmdb.u_tech_owner,
    u.email as 'pri_owner_email',
    cmdb.dv_u_tech_owner,
    cmdb.u_active,
    cmdb.short_description,
    cmdb.dv_u_cbt_owner,
    cmdb.u_cbt_owner
    FROM  cmdb_ci cmdb
    LEFT JOIN sys_user u on u.sys_id=cmdb.u_tech_owner 
    WHERE cmdb.name IN (@P1)
    OR cmdb.sys_id IN (@P1)
    AND cmdb.u_active = 'true'
    AND (cmdb.sys_class_name = 'cmdb_ci_voice_hardware' 
    OR cmdb.sys_class_name LIKE '%server%')
    AND cmdb.sys_class_name NOT LIKE '%web_server%'
    AND cmdb.sys_class_name NOT IN (
    'cmdb_ci_app_server_websphere',
    'cmdb_ci_app_server_jboss',
    'cmdb_ci_app_server_tomcat',
    'cmdb_ci_appl_license_server',
    'cmdb_ci_app_server_weblogic'
    );
"#;
// Instead of doing a format like this:

pub fn query_list_cmdb_ci_servers(keys: Vec<String>) -> String {
    format!(
        r#"
    SELECT
    cmdb.sys_id,
    cmdb.name,
    cmdb.u_tech_owner,
    u.email as 'pri_owner_email',
    cmdb.dv_u_tech_owner,
    cmdb.u_active,
    cmdb.short_description,
    cmdb.dv_u_cbt_owner,
    cmdb.u_cbt_owner
    FROM  cmdb_ci cmdb
    LEFT JOIN sys_user u on u.sys_id=cmdb.u_tech_owner 
    WHERE cmdb.name IN ({0})
    OR cmdb.sys_id IN ({0})
    AND cmdb.u_active = 'true'
    AND (cmdb.sys_class_name = 'cmdb_ci_voice_hardware' 
    OR cmdb.sys_class_name LIKE '%server%')
    AND cmdb.sys_class_name NOT LIKE '%web_server%'
    AND cmdb.sys_class_name NOT IN (
    'cmdb_ci_app_server_websphere',
    'cmdb_ci_app_server_jboss',
    'cmdb_ci_app_server_tomcat',
    'cmdb_ci_appl_license_server',
    'cmdb_ci_app_server_weblogic'
    );
    "#,
        keys.join(",")
    )
}
pimeys commented 3 years ago

Hey! So, SQL Server doesn't really have support for vectors like some other databases, such as PostgreSQL do with their array operations, which would make it possible to write your query like you do in the first example.

So, we have the statement and we have the parameters. Using an IN statement, you must have a different statement for different array sizes, e.g. an array of size 2 needs IN (@P1, @P2) and size of 3 goes obviously as IN (@P1, @P2, @P3).

This means a constant query will not work, but you need to evaluate a new statement per array size, and pass the parameters flattened to the slice.

What we do in Prisma is we have a query builder, that will convert a convenient data structure into a query string and parameters. You can read how it's done in here: https://github.com/prisma/quaint/tree/master/src (see the ast and visitor code).

I warn you though of using that crate, or at least you should pin to a certain version of the master branch. We had to fork some of our dependencies ages ago and we can't really guarantee API stability anymore at this point. But, if you don't need super complex structures, what you can do is a method that should somehow go like this:

fn my_in_statement_query(params: &[something]) -> String {
    format!("... WHERE IN ({}) ...", params.iter().enumerate(), map(|(i, _)| format!("P{}", i)).collect::<Vec<_>>().join(","))
}

Now this one is not very efficient, and depending on your payload, it might use way more RAM in the server than you'd expect. Also, SQL Server has a size limit how many parameters you can have per query. The limit is about 2000-2500 parameters, which you can hit quite easily with a free-form array, leading to an error and a crash, making you to do stupid things as splitting the query to multiple small queries, plummeting the performance. I think only PostgreSQL solved this in an elegant way, and you can do something like:

SELECT * FROM tbl WHERE id = ANY ($1);

Here the parameter can be an array, and you can use the same statement for all sizes of the array! If you find a nice way of doing the same in SQL Server, and we miss a feature that would enable it, please tell me!

servonlewis commented 3 years ago

Thank you for the amazing comment!

Yes this is something I've been dealing with for some time now but most people tend to shy away from this question.

I use and love Postgres, so the array types are just amazing for me, but sadly, at work we only use MSSQL.

That said, do you feel it is worth it to do the mapping? It may be inefficient, but I think it will at least prevent sql injection, which is the whole reason for this post honestly.

pimeys commented 3 years ago

In any ways you should design your application so you would not need large IN queries. If you have to, you might get an error if you have too many parameters, but you can handle that and it will not be a problem.

I highly highly recommend parameterizing everything that goes to the database, always!

And, SQL Server is great in some ways compared to PostgreSQL. But arrays it cannot support, so design your APIs accordingly.

esheppa commented 3 years ago

Another option is to use relational operators. You can have your data for your in clause stored in a Vec<String> in Rust and then use serde_json::to_string to serialize this to a string and bind as a parameter. On the T-SQL side, you can use openjson(@P1) where @P1 is your serialized IN data. You can then join your current table to this one to do the filtering.

A basic example (from Rust you would bind a parameter rather than declaring one, I've just done this so the example can be run entirely in SQL):

declare @P1 nvarchar(max) = '[1,2,3,4,5]';
select a.value
from openjson(@P1) a

this returns the following data:

value
1
2
3
4
5

For your specific example, you could use something similar to this:

let filters_vec = vec!["filter1".to_string(), ...];
let filters = serde_json::to_string(&filters_vec).unwrap();
client.query(" SELECT
    cmdb.sys_id,
    cmdb.name,
    cmdb.u_tech_owner,
    u.email as 'pri_owner_email',
    cmdb.dv_u_tech_owner,
    cmdb.u_active,
    cmdb.short_description,
    cmdb.dv_u_cbt_owner,
    cmdb.u_cbt_owner
    FROM  cmdb_ci cmdb
    INNER JOIN openjson(@P1) a on a.value = cmdb.name or a.value = cmdb.sys_id
    LEFT JOIN sys_user u on u.sys_id=cmdb.u_tech_owner 
    WHERE cmdb.u_active = 'true'
    AND (cmdb.sys_class_name = 'cmdb_ci_voice_hardware' 
    OR cmdb.sys_class_name LIKE '%server%')
    AND cmdb.sys_class_name NOT LIKE '%web_server%'
    AND cmdb.sys_class_name NOT IN (
    'cmdb_ci_app_server_websphere',
    'cmdb_ci_app_server_jboss',
    'cmdb_ci_app_server_tomcat',
    'cmdb_ci_appl_license_server',
    'cmdb_ci_app_server_weblogic'
    )",
  &[&filters],
  )

there are also some alternatives to OPENJSON for instance STRING_SPLIT which may be useful if you don't already depend on serde_json.

servonlewis commented 3 years ago

Very interesting!! Will try it out

Another option is to use relational operators. You can have your data for your in clause stored in a Vec<String> in Rust and then use serde_json::to_string to serialize this to a string and bind as a parameter. On the T-SQL side, you can use openjson(@P1) where @P1 is your serialized IN data. You can then join your current table to this one to do the filtering.

A basic example (from Rust you would bind a parameter rather than declaring one, I've just done this so the example can be run entirely in SQL):


declare @P1 nvarchar(max) = '[1,2,3,4,5]';

select a.value

from openjson(@P1) a

this returns the following data:

| value |

| --|

| 1 |

| 2 |

| 3 |

| 4 |

| 5 |

For your specific example, you could use something similar to this:


let filters_vec = vec!["filter1".to_string(), ...];

let filters = serde_json::to_string(&filters_vec).unwrap();

client.query(" SELECT

    cmdb.sys_id,

    cmdb.name,

    cmdb.u_tech_owner,

    u.email as 'pri_owner_email',

    cmdb.dv_u_tech_owner,

    cmdb.u_active,

    cmdb.short_description,

    cmdb.dv_u_cbt_owner,

    cmdb.u_cbt_owner

    FROM  cmdb_ci cmdb

    INNER JOIN openjson(@P1) a on a.value = cmdb.name or a.value = cmdb.sys_id

    LEFT JOIN sys_user u on u.sys_id=cmdb.u_tech_owner 

    WHERE cmdb.u_active = 'true'

    AND (cmdb.sys_class_name = 'cmdb_ci_voice_hardware' 

    OR cmdb.sys_class_name LIKE '%server%')

    AND cmdb.sys_class_name NOT LIKE '%web_server%'

    AND cmdb.sys_class_name NOT IN (

    'cmdb_ci_app_server_websphere',

    'cmdb_ci_app_server_jboss',

    'cmdb_ci_app_server_tomcat',

    'cmdb_ci_appl_license_server',

    'cmdb_ci_app_server_weblogic'

    )",

  &[&filters],

  )

there are also some alternatives to OPENJSON for instance STRING_SPLIT which may be useful if you don't already depend on serde_json.