porsager / postgres

Postgres.js - The Fastest full featured PostgreSQL client for Node.js, Deno, Bun and CloudFlare
The Unlicense
7.53k stars 275 forks source link

Generically parameterizing queries #940

Closed jtlapp closed 2 months ago

jtlapp commented 2 months ago

Hello. I'm trying to figure out how to complete this function:

async executeQuery(
    sql: ReturnType<typeof postgres>,
    query: string,
    args: { [key: string]: string | number | boolean }
  ): Promise<TBD> {
    // TBD
  }

query contains named placeholders for parameters, and args provides the values of those placeholders by key, which provides the placeholder name. I'm generically representing queries and dynamically providing their arguments.

I'm looking for the function's implementation and return type.

Is this possible to do in a safe way, with proper literal escaping?

(In case you're questioning the need to do this, it's for a series of benchmarking tests, each potentially written for a different platform, in a different framework, in a different language. Rather than copying the queries from implementation to implementation and maintaining them across implementations, I'm centralizing them. I don't even know what queries I'll end up using in the end, so I want to be able to centrally change the queries for all frameworks all at once as I experiment. The above is for the Deno implementation. Moreover, I won't be using the exact function above; it's just to teach me how to do this.)

jtlapp commented 2 months ago

If we had a method like file that accepted strings instead of files, that would do the job:

const result = await sql.file('query.sql', ['Murray', 68])
tilemanrenovations commented 2 months ago

typeof true); Try it out.

jtlapp commented 2 months ago

typeof true); Try it out.

Was there a copy-paste error? I'm grateful for any help!

jtlapp commented 2 months ago

It turns out that sql.unsafe(query, args) does exactly what I want. This isn't apparent from the documentation, which says nothing about how args are substituted into the query in this case. I was assuming that with a name like "unsafe" that it was literal substitution, not SQL escaping. From my experimentation, I found that this is not the case.

For others looking to solve this problem, here's what you need to know:

For completeness, can we assume that options are any of these?

porsager commented 2 months ago

How about a PR to the readme then @jtlapp 😉 You'd be in the best position to make it since you know what's missing 👍 I'll help ensure it's correct if you start one.

The reasoning for the naming is that there is no way to ensure you haven't combined the query string yourself, so it's potentially unsafe even though you're using parameters.

query accepts positional parameters of the form $1, $2, etc. (It does not also accept ${param} parameters.)

There's nothing preventing a user from sticking a template literal for the query string in unsafe, where they can then use ${...}. The problem with that is that then they're again in the territory of just concatting strings. Hence unsafe.

args is an array of values that are to be substituted for the positional parameters.

This happens on the database level, Postgres.js does nothing to them. The string is sent as is, and the parameters are sent on the side as is (with the exception of serialization). I think conceptually looking at them like variables that the PostgreSQL planner uses in the query is a better mental picture.

sql.unsafe will escape the values for safe embedding in SQL (though it may defer the work to postgres itself).

The parameters are not escaped, they are transferred as is to PostgreSQL that then uses them in the $1, $2, etc positions.

For completeness, can we assume that options are any of these?

No, these are options for the connection / instance.

sql.unsafe currently responds to two options that are mutually exclusive:

porsager commented 2 months ago

I'm looking for the function's implementation and return type.

Hehe.. Sorry, I actually disregarded this question as a Typescript issue because of this code sample.

Is this possible to do in a safe way, with proper literal escaping?

No it's not. Any time you can pass a regular string how should the library know what the user has cooked up?

(In case you're questioning the need to do this, it's for a series of benchmarking tests, each potentially written for a different platform, in a different framework, in a different language. Rather than copying the queries from implementation to implementation and maintaining them across implementations, I'm centralizing them. I don't even know what queries I'll end up using in the end, so I want to be able to centrally change the queries for all frameworks all at once as I experiment. The above is for the Deno implementation. Moreover, I won't be using the exact function above; it's just to teach me how to do this.)

I think that wouldn't give you the true picture of the various libraries - I think benchmarks ought to be written in the idiomatic style of each library. Even so, for Postgres.js you should add the { prepare: true }option to get closer to the way queries would be performed when using it the idiomatic way.

Be curious to see what you find. Here's some prior art if you're interested:

https://porsager.github.io/imdbench/sql.html https://github.com/porsager/postgres-benchmarks

jtlapp commented 2 months ago

@porsager thank you for the detailed explanation! I posted a PR.

I'm primarily benchmarking concurrency, so the computational expense of the query shouldn't matter much, though memory usage may still be an issue.

I have found it challenging to find much that's helpful about what sorts of query combinations I need to create to get a good sense of the relative ability frameworks and platforms have for supporting concurrency. So I plan to play around. I want it easy and quick to do so, particularly because I'm building images for deployment to a Kubernetes cluster, where I'll be running the experiments.

But I do appreciate the reminder that the most reliable measures will come from using real-world implementations of the queries. Once I understand what sorts of queries I need, I'll likely then hard-code them to get trustworthy test results. And then as a bonus I'll also have meaningful latency benchmarks.

Thank you for your benchmark links, and thanks again for your assistance! I already have this working.

porsager commented 2 months ago

@jtlapp sounds great :)

In the case of Postgres.js there is much more than computational. For instance using prepared statements implicitly is a huge gain, as well as ensuring pipelining the queries will work correctly. This all works the best when using Postgres.js properly with tagged template literals, but as mentioned, should be fine with { prepare: true } for .unsafe.

Would love a heads up here if the benchmarks are something you'll share.