Open abonander opened 2 years ago
A thought I had was to not have a sql()
method on InjectionSafe
and treat it like a true marker trait, then have Q: AsRef<str> + InjectionSafe
as the bound on query()
.
The main idea with that would be to make it more obvious that the parameter is still supposed to be "some sort of string" but with an extra marker trait attached.
I'm stumbling across this right now, so let me add another use case. It's pretty common to want to sort a query based on user preferences or request parameters. The easiest way to do this is to dynamically build the ORDER BY
clause. But that becomes basically impossible with the current API.
I had to construct a truly grotesque SQL abomination to try to work around this, where the user's preferred sort order is expressed as a three-element array of an enum type:
ORDER BY
CASE $2[0]
WHEN 'original' THEN COALESCE( a.sortable_name, a.name )
WHEN 'transcripted' THEN COALESCE( a.transcripted_sortable_name, a.transcripted_name )
WHEN 'translated' THEN COALESCE( a.translated_sortable_name, a.translated_name )
END ASC,
CASE $2[1]
WHEN 'original' THEN COALESCE( a.sortable_name, a.name )
WHEN 'transcripted' THEN COALESCE( a.transcripted_sortable_name, a.transcripted_name )
WHEN 'translated' THEN COALESCE( a.translated_sortable_name, a.translated_name )
END ASC,
CASE $2[2]
WHEN 'original' THEN COALESCE( a.sortable_name, a.name )
WHEN 'transcripted' THEN COALESCE( a.transcripted_sortable_name, a.transcripted_name )
WHEN 'translated' THEN COALESCE( a.translated_sortable_name, a.translated_name )
END ASC
I'm not even sure if this works, because now I have other lifetime issues. Notably, the parameter passed to .bind()
also have to live for as long as the query itself, which is also a pain. It'd be really nice to be able to pass in cloned bind params in this case.
I'm happy to open another issue for that if that would be helpful.
Oops, ignore the bit about .bind()
. I can pass cloned arguments to it.
@autarch you might also be interested in this proposal https://github.com/launchbadge/sqlx/issues/1488
The main idea is to allow adding/removing parts of the query at runtime while still providing compile-time checks.
@autarch you might also be interested in this proposal #1488
The main idea is to allow adding/removing parts of the query at runtime while still providing compile-time checks.
Thanks. This looks like an interesting approach.
FWIW, I initially tried using the compile-time query macros, but I gave up because there's no non-hacky way to support custom types (like CITEXT
) with those macros. Obviously that's OT for both of these issues.
I'm gonna add a bit of info as question that sparked this issue was asked by me.
Bit that led me to investigation of query
and execute
apis is:
PgPoolOptions::new()
.max_connections(self.pool_size)
.after_connect(move |conn| {
let schema = schema.clone();
Box::pin(async move {
conn.execute(format!("SET search_path TO {}", schema).as_str())
.await?;
Ok(())
})
})
.connect_with(
PgConnectOptions::new()
.host(&self.host)
.port(self.port)
.username(&self.username)
.password(&self.password)
.database(&self.database),
)
.await
I can guarantee that schema
is safe, as it's provided by person deploying application on a machine, not an end user. AssertInjectionSafe
still holds my hand, but it's clear in it's intent and I can live with it.
Per my use case, and I failed to clarify that initially, it would be beneficial if I could
sqlx::query("SET search_path TO $1")
.bind(schema.clone())
.execute(conn)
.map(|r| r.map(|_| ()))
.boxed()
which currently causes postgres connections to hang indefinitely
or
conn.execute(format!("SET search_path TO {}", schema)).map(|r| r.map(|_| ())).boxed()
which can't compile
or
set schema in PgConnectOptions
as per jdbc spec (options param in https://jdbc.postgresql.org/documentation/head/connect.html)
This is more of a nitpick about how code looks and how much extras are there rather than a real issue.
We actually just released support for the options
parameter, either in the URL or set on PgConnectOptions
.
The reason the version using sqlx::query()
hangs indefinitely is because Postgres doesn't support SET
in prepared statements, so your .after_connect()
is kicking back an error which is causing Pool
to drop the connection and try again and again.
We actually just released support for the options parameter, either in the URL or set on PgConnectOptions.
If that's true I'm in awe 😄
Take a look at the Scala library Relate and its InterpolatedQuery
type.
It's basically a string-like type for a known-safe query fragment that also carries a list of values to bind to the parameters. In Rust, maybe something like
struct InterpolatedQuery {
query: Cow<'static, str>,
params: Vec<Box<dyn Encode>>,
}
You could have a safe constructor from &'static str
, a scarier-named constructor from String
, and a format_query!()
macro that works a lot like query!()
but concatenates InterpolatedQuery
args into the literal query and all other types as bound parameters:
let where: InterpolatedQuery = format_query!("id = ?", id); // or maybe `format_query!("id = {id}")` and insert the placeholder
let sort = InterpolatedQuery::sql("foo");
sqlx::query(format_query!("select foo from bar where {where} order by {sort}"))
I found a workaround for the issue I had with wanting to generate a query dynamically and then return a stream from calling fetch()
, and I wanted to document it here in case anyone else comes across this issue:
use futures_util::stream::BoxStream;
use sqlx::PgPool;
#[derive(Clone, Debug)]
pub struct DB {
pool: PgPool,
}
impl DB {
pub fn get_item<'a>(
&'a self,
some_id: &str,
select: &'a mut String,
) -> BoxStream<'a, Result<Item, sqlx::Error>> {
*select = format!("...", ...);
sqlx::query_as::<_, Item>(select)
.bind(some_id)
.fetch(&self.pool)
}
}
Then the caller can easily arrange to keep the string used for the select
alive as long as the returned stream.
I'm not sure if this proposal would cover it, but I'd love to be able to specify that the table identifiers were injection-safe and to provide the code myself for generating those identifiers.
PostgreSQL's CREATE EXTENSION
can provide a WITH SCHEMA schema_name
clause. This means you can install an extension to a non-standard schema. Anything that uses this is no longer accessible with SQLx because you can't for example (that I can see) create an executable that does something like...
./load_into_sql --postgis_schema foo
And have the schema for PostGIS determined at runtime.
I found a workaround for the issue I had with wanting to generate a query dynamically and then return a stream from calling
fetch()
, and I wanted to document it here in case anyone else comes across this issue: ... Then the caller can easily arrange to keep the string used for theselect
alive as long as the returned stream.
I had the exact same issue, thanks for that idea. I do wish I had a way to make my own query with format!
(I need to dynamically select a bunch of fields) and be able to return a stream without this hack tho.
As per @luke-biel's earlier comment:
PgPoolOptions::new() .max_connections(self.pool_size) .after_connect(move |conn| { let schema = schema.clone(); Box::pin(async move { conn.execute(format!("SET search_path TO {}", schema).as_str()) .await?; Ok(()) }) }) .connect_with( PgConnectOptions::new() .host(&self.host) .port(self.port) .username(&self.username) .password(&self.password) .database(&self.database), ) .await
I can guarantee that
schema
is safe, as it's provided by person deploying application on a machine, not an end user.AssertInjectionSafe
still holds my hand, but it's clear in it's intent and I can live with it.
I am using something similar where I input conn.execute(format!("SELECT * FROM {}", TABLE_CONST).as_str())
where TABLE_CONST
is the name of a table set as const TABLE_CONST: &'static str = "some-table"
in my codebase ... I want to ask/clarify if this is sql injection safe?. I still use .bind()
for user parameters though
SQLx desperately needs some facility like this, however it's spelled. I don't think the current API actually has much security benefit - in the most common case, you're relying on someone who is unaware of SQL injection issues being repelled by the code smell of having to take a reference to a constructed string. I think programmers that combine such a lack of awareness with such delicate sensibilities are rare.
On the other hand, this pattern makes many more sophisticated uses an enormous pain. Progressive construction of query strings becomes impossible in many scenarios because of the lifetime on the Query
object. This is especially true for streaming queries where you really want to return an object from your construction function, which is now bounded by the lifetime on the input string. I see the suggestion above to pass a mutable String
into all such query functions to control the lifetime of the query, but that's a terrible way to warp every query API function to paper over an issue in an underlying library.
The consequence for us is that we have literally thousands of lines of SQL queries duplicated between match branches that only vary by a where clause or join.
We have worked around this interally using the async_stream crate, which can be used to trivially create anonymous Stream
implementations using familiar generator syntax. This solves the issue by wrapping the sqlx stream in another stream that can own the query (and any bind parameters):
fn example(connection: &mut MySqlConnection) -> impl Stream<Item=sqlx::Result<MySqlRow>> + '_ {
let sql = "SELECT 1".to_owned();
stream! {
for await row in sqlx::query(&sql).fetch(connection) {
yield row;
}
}
}
If the return type needs to be a boxed stream then this unfortunately adds a level of indirection, but under our workload we haven't noticed any significant performance impact from this.
I'd love to see robust sanitisation of table and schema names for querybuilder. I can make whole sanitisation by myself, but... this would mean I don't need query builder.
An example for tables for SQlite is below. Basically, it's possible to insert all sorts of characters. to name a few: \"()∂ę
sqlite> create table "ta ble\"" name" (id INTEGER PRIMARY KEY NOT NULL);
sqlite> .schema
CREATE TABLE IF NOT EXISTS "ta ble\"" name" (id INTEGER PRIMARY KEY NOT NULL);
sqlite> select * from sqlite_schema;
table|ta ble\" name|ta ble\" name|2|CREATE TABLE "ta ble\"" name" (id INTEGER PRIMARY KEY NOT NULL)
PS: nevermind, I used sqlparser
to make a simple select which won't break. It was a bit bulky, but all names are sanitized for all dialects at once.
In case it's helpful for other people running into this issue who need a workaround for the time being: My issues were all dealing with passing bind parameters to dynamically generated queries, since I'm generating queries with the query builder. I was having big issues with the lifetimes, but I got it working with something like this:
async fn query_with(
&self,
query: FancyQuery,
bind: impl for<'q> FnOnce(Query<'q, Sqlite, SqliteArguments<'q>>) -> Query<'q, Sqlite, SqliteArguments<'q>>,
) -> Result<Vec<Row>> {
let query_text = query.generate();
bind(sqlx::query(&query_text)).fetch_all(&self.db).await
}
and then use it like:
let rows = db.query_with(fancy_query, |q| q.bind(1).bind("hi"));
The closure lets you provide the binds in the appropriate lifetime, and avoids dealing with traits for big tuples etc. that might be necessary with passing binds as parameters.
Status Quo
At the core of the
sqlx::query()
family of functions lies a contentious design choice that goes all the way back to the initial prototypes of SQLx:The query string must be a string slice, usually
'static
but can also be borrowed. This was a deliberate choice, because allowingString
to be passed directly makes it extremely tempting to simplyformat!()
user input into the query string instead of using bind parameters, which would easily introduce SQL injection vulnerabilities into the application. For example, to co-opt XKCD #327:Then suddenly, you've lost all student data for the year because the query that got executed looked like this:
(Postgres would reject the attempt to prepare a query string with more than one statement, but MySQL and SQLite wouldn't.)
The idea is that by requiring dereffing to a string slice, it will at least make the code start to smell a bit and hopefully lead to the user re-evaluating their implementation:
The "easier" path is intended to be using bind parameters, which by design do not introduce injection vulnerabilities because the database knows not to interpret bound values as part of the SQL. By comparison, this implementation is completely injection-safe, and is the approach that the API design was intended to push users towards:
However, it does have its downsides.
The intention is not at all clear in the design.
Time and again, users have opened issues or questions on the discussion board or asked on Discord about why the API is designed this way, and it gets very tedious to explain the reasoning (I admittedly have copy-pasted large chunks of this first part from a Q&A so I didn't have to write it again). This isn't a complaint about people asking questions, of course, it's more an expression of annoyance that the question needs to be asked in the first place.
Of course, we could just explain it better in the docs, which we should be doing anyways. But it's still an issue that it's even necessary.
The design doesn't give off very obvious signals when the user is doing something wrong.
All you have to do for the function to accept a
String
is to add a&
and let auto-deref do the rest. Yeah, it's a code smell, but it reeks more of poor API design than injection vulnerability. I personally would complain about an API so inflexible that all I could pass is&str
; what if I want to passString
orArc<str>
?And if all a user has to do is add a
&
, is there really any opportunity for them to gain insight into why they shouldn't be doing that?There's legitimate use-cases for passing a constructed
String
if the user knows what they're doing.This design decision is notorious for making it very difficult to implement query builders on top of SQLx, because of the lifetime issues that it introduces: https://github.com/launchbadge/sqlx/issues/1428
Proposal
I don't want to just allow
String
to be passed directly, because I think it's really important for the design of an API to avoid straight-up handing the user a footgun. We can warn about injection vulnerabilities in the docs all we want, but as long as it's the path of least resistance, some people will still end up shooting themselves. And allowing that to happen just isn't the Rust way.I think we could copy a page from
std
's notebook here and introduce something similar toUnwindSafe
: https://doc.rust-lang.org/stable/std/panic/trait.UnwindSafe.htmlExisting usage with
&'static str
would remain unchanged:However, usage of dynamic query strings would now be forced to assert that they do not contain SQL injection vulnerabilities, and in exchange would be freed of lifetime constraints:
Of course, it is still possible to bypass using
AssertInjectionSafe
, but it's a more significant code smell (and involves deliberately introducing a memory leak):And ideally, on the way the compiler would have informed them that their query string needs to implement
InjectionSafe
which would have prompted them to check the docs and learn about SQL injection and why what they're trying to do is deliberately annoying.