jimbaker / tagstr

This repo contains an issue tracker, examples, and early work related to PEP 999: Tag Strings
51 stars 6 forks source link

Communicating the use of a `sql` tag in PEP or to the community #38

Closed jimbaker closed 3 months ago

jimbaker commented 10 months ago

We currently have an existing example of a sql tag, which can be improved by using an AST-based approach, with a parser like SQLglot, then either tree walked or compiled ("transpiled"?) to Python.

Note that there was a fair amount of posts on X/Twitter last week reacting to seeing a specific slide at NextJS conf with a JavaScript sql template tag; see for example https://twitter.com/phillip/status/1717617382867575241, where people were confusing SQL injections with a sql tag that properly manages interpolations.

Communicating that a properly written sql tag does the right thing will be an important part of the PEP work (and also complements ORM approaches like SQLAlchemy, again as demonstrated by the existing demo code).

I looked into the tag that was used in the NextJS slide; it is https://github.com/gajus/slonik#user-content-slonik-sql-tag

Note that there are similar/related projects for JS, including

The following is a good blog post, which provides some good reasoning on the problem with current parametization approaches: https://dev.to/newbie012/please-dont-manually-parameterize-your-sql-queries-3m7k

Lastly there's this VS Code formatting support: https://marketplace.visualstudio.com/items?itemName=frigus02.vscode-sql-tagged-template-literals

pauleveritt commented 10 months ago

Will this be too long for the PEP? Should the PEP point at a writeup focused on the details of this?

jimbaker commented 10 months ago

@pauleveritt For the PEP itself, it should suffice to state that tag authors can always escape, or other sanitize, untrusted interpolations appropriate to its context; and this can be done recursively. Even if all data is trusted, this can prevent bugs. Having said that, it needs to be more than this one sentence or so, the details matter, and this is why I have been thinking about this from the fact that tags target DSLs; DSLs can be parsed; and this parse can provide the necessary context on how to escape.

(If we are going even further, note that escaping/sanitizing can include DOS prevention, such as limiting payload size, etc. Now it's getting long!)

Otherwise it's more in the communication of the PEP. See for example https://legacy.reactjs.org/docs/introducing-jsx.html#jsx-prevents-injection-attacks or https://stackoverflow.com/questions/33644499/what-does-it-mean-when-they-say-react-is-xss-protected

Back to my observation on the sql tag and the community reaction. React has been out for 10 years, so the React community is presumably well used to the idea that it's possible to safely support interpolations without HTML injection attacks. But then this there's wide-spread confusion on a very similar sql tag. Possibly this is because of the different syntax, JSX is an earlier approach to what is being done with JS tagged template literals, as with Lit.

Note: the old React docs seem to provide more advocacy of JSX as an approach, including how it prevents XSS/HTML injection, I haven't found anything comparable in the current doc set.

pauleveritt commented 10 months ago

As update: I'm working this weekend on memoize and having a combo: simple version in the PEP, longer version with some narrative in another doc. I can do the same for sanitize, if you think that approach is good.

If so: perhaps you and I could schedule weekly calls, knock out remaining things on PEP? I feel there's some zen that's still evolving in your head on some things.

Also: another possible contributor on the way (perhaps to fdom.)

Also also: I had some Docker stuff ready. Put it here, or in fdom?

jimbaker commented 10 months ago

As update: I'm working this weekend on memoize and having a combo: simple version in the PEP, longer version with some narrative in another doc. I can do the same for sanitize, if you think that approach is good.

+1

If so: perhaps you and I could schedule weekly calls, knock out remaining things on PEP?

This is a good idea! I'm not otherwise using GitHub (my work is in GitLab), so it's hard for me to keep synced with it.

I feel there's some zen that's still evolving in your head on some things.

Actually there's a Zen of Python statement that applies:

There should be one-- and preferably only one --obvious way to do it.

My feeling is that new users of Python have a reasonable intutition that they should be able to directly use f-strings to compose HTML, SQL, bash, etc, given that it usually works -- except for corner cases and injection attacks of course! Tag strings enable this usage model without the downsides, assuming a correctly implemented tag function.

The flip side is moving the complexity out of the user using a tag function to the implementer; this is the right tradeoff. In terms of the evolving "zen" here, it's focusing on the DSL as a target, and using parsers and interpreter/compiler approaches to ensure correct interpolations.

Also: another possible contributor on the way (perhaps to fdom.)

That's really great!

Also also: I had some Docker stuff ready. Put it here, or in fdom?

Let's add it here.

pauleveritt commented 3 months ago

We've done the work for the stripped-down section in the PEP. As mentioned in #40 -- longer tutorials are being worked on as part of a "site" that is on a different timing than the PEP, so closing this.