Reduce SQL sanitizer allocations

ninedraft commented 1 month ago

https://github.com/jackc/pgx/issues/2124

Result:

Main optimizations:

extensive usage of sync.Pool for byte buffers, lexers and parsed query structs
append-style string formatters for int64, float64 and time.Time + bytes.Buffer.AvailableBuffer
rework of QuoteString and QuoteBytes to append-style (with tests for backwards compatibility)

Misc changes:

benchmarks for Query.Sanitize and SanitizeSQL functions
a tiny script for generation of benchmark reports for selected commits and diff (using benchstat)
fuzzing of QuoteString and QuoteBytes (I did'n find any problems for 1h of fuzzing, but you can never be sure for 100%)

Since optimization is an extremely hard problem, I think it's worth checking some more benchmarks.

I would be very grateful for your opinion on this and recommendations/advice, @jackc @vtolstov

ninedraft commented 1 month ago

benchmark diffs for concrete optimisations

goos: darwin
goarch: arm64
pkg: github.com/jackc/pgx/v5/internal/sanitize
cpu: Apple M1
              │ benchmarks/0_base_case.bench │     benchmarks/1_buf_pool.bench     │ benchmarks/2_append_AvailableBuffer.bench │    benchmarks/3_quoteBytes.bench    │   benchmarks/4_quoteString.bench    │ benchmarks/5_add_lexer_and_query_pools.bench │ benchmarks/6_drop_too_large_values_from_memory_pools.bench │
              │            sec/op            │   sec/op     vs base                │      sec/op        vs base                │   sec/op     vs base                │   sec/op     vs base                │        sec/op         vs base                │               sec/op                vs base                │
Sanitize-8                       718.2n ± 1%   578.8n ± 1%  -19.41% (p=0.000 n=10)         439.9n ± 0%  -38.74% (p=0.000 n=10)   413.6n ± 4%  -42.42% (p=0.000 n=10)   397.1n ± 1%  -44.72% (p=0.000 n=10)            403.6n ± 1%  -43.81% (p=0.000 n=10)                          400.8n ± 2%  -44.20% (p=0.000 n=10)
SanitizeSQL-8                    2.089µ ± 0%   1.956µ ± 0%   -6.37% (p=0.000 n=10)         1.828µ ± 0%  -12.49% (p=0.000 n=10)   1.812µ ± 1%  -13.28% (p=0.000 n=10)   1.789µ ± 1%  -14.36% (p=0.000 n=10)            1.670µ ± 0%  -20.06% (p=0.000 n=10)                          1.673µ ± 0%  -19.91% (p=0.000 n=10)
geomean                          1.225µ        1.064µ       -13.13%                        896.8n       -26.79%                  865.5n       -29.34%                  842.8n       -31.19%                           820.9n       -32.98%                                         818.8n       -33.15%

              │ benchmarks/0_base_case.bench │     benchmarks/1_buf_pool.bench     │ benchmarks/2_append_AvailableBuffer.bench │    benchmarks/3_quoteBytes.bench    │   benchmarks/4_quoteString.bench    │ benchmarks/5_add_lexer_and_query_pools.bench │ benchmarks/6_drop_too_large_values_from_memory_pools.bench │
              │             B/op             │    B/op      vs base                │       B/op         vs base                │    B/op      vs base                │    B/op      vs base                │         B/op          vs base                │                B/op                 vs base                │
Sanitize-8                       1488.0 ± 0%    528.0 ± 0%  -64.52% (p=0.000 n=10)          472.0 ± 0%  -68.28% (p=0.000 n=10)    456.0 ± 0%  -69.35% (p=0.000 n=10)    424.0 ± 0%  -71.51% (p=0.000 n=10)             424.0 ± 0%  -71.51% (p=0.000 n=10)                           424.0 ± 0%  -71.51% (p=0.000 n=10)
SanitizeSQL-8                    2216.0 ± 0%   1256.0 ± 0%  -43.32% (p=0.000 n=10)         1200.0 ± 0%  -45.85% (p=0.000 n=10)   1184.0 ± 0%  -46.57% (p=0.000 n=10)   1152.0 ± 0%  -48.01% (p=0.000 n=10)             552.0 ± 0%  -75.09% (p=0.000 n=10)                           552.0 ± 0%  -75.09% (p=0.000 n=10)
geomean                         1.773Ki         814.4       -55.15%                         752.6       -58.55%                   734.8       -59.54%                   698.9       -61.51%                            483.8       -73.36%                                          483.8       -73.36%

              │ benchmarks/0_base_case.bench │    benchmarks/1_buf_pool.bench     │ benchmarks/2_append_AvailableBuffer.bench │   benchmarks/3_quoteBytes.bench    │   benchmarks/4_quoteString.bench   │ benchmarks/5_add_lexer_and_query_pools.bench │ benchmarks/6_drop_too_large_values_from_memory_pools.bench │
              │          allocs/op           │ allocs/op   vs base                │     allocs/op      vs base                │ allocs/op   vs base                │ allocs/op   vs base                │      allocs/op        vs base                │             allocs/op               vs base                │
Sanitize-8                       11.000 ± 0%   7.000 ± 0%  -36.36% (p=0.000 n=10)          4.000 ± 0%  -63.64% (p=0.000 n=10)   3.000 ± 0%  -72.73% (p=0.000 n=10)   2.000 ± 0%  -81.82% (p=0.000 n=10)             2.000 ± 0%  -81.82% (p=0.000 n=10)                           2.000 ± 0%  -81.82% (p=0.000 n=10)
SanitizeSQL-8                     26.00 ± 0%   22.00 ± 0%  -15.38% (p=0.000 n=10)          19.00 ± 0%  -26.92% (p=0.000 n=10)   18.00 ± 0%  -30.77% (p=0.000 n=10)   17.00 ± 0%  -34.62% (p=0.000 n=10)             10.00 ± 0%  -61.54% (p=0.000 n=10)                           10.00 ± 0%  -61.54% (p=0.000 n=10)
geomean                           16.91        12.41       -26.62%                         8.718       -48.45%                  7.348       -56.55%                  5.831       -65.52%                            4.472       -73.56%                                          4.472       -73.56%

jackc commented 1 month ago

LGTM. But this is obviously a very security critical part of the code, so I'd like if we can get some more eyes on this before merging.

vtolstov commented 1 month ago

lgtm, i'm try to check on my hot path in next few days.

vtolstov commented 1 month ago

In my tests i don't saw any issues.

ninedraft commented 1 month ago

@jackc

But this is obviously a very security critical part of the code, so I'd like if we can get some more eyes on this before merging.

It would be very much appreciated if you could suggest someone I can tag on this issue. I'm also in the process of writing more tests for SQL injection + more fuzzing

jackc commented 1 month ago

It would be very much appreciated if you could suggest someone I can tag on this issue.

I wish I could. Unfortunately, I don't know of anyone.

I'm also in the process of writing more tests for SQL injection + more fuzzing

👍 It's been a couple weeks since I reviewed the code, so I can review it again with fresh eyes now. It's not quite as good as multiple reviewers, but at least it will be multiple reviews.

I'll wait until you add the additional tests.

jackc / pgx

Reduce SQL sanitizer allocations #2136