pganalyze / pg_query

Ruby extension to parse, deparse and normalize SQL queries using the PostgreSQL query parser
BSD 3-Clause "New" or "Revised" License
785 stars 85 forks source link

PgQuery::Node: Add inner and inner= helpers to modify inner object #306

Closed lfittl closed 9 months ago

lfittl commented 10 months ago

Because of how oneof messages work in protobuf, its a bit tedious to make modifications to a Node object, in particular when changing the type of the inner object held within the Node in a generic code path (i.e. with a dynamic inner type), which required first knowing the name of the new inner object type, and then using public_send to set the new value.

To help, introduce the new "inner" and "inner=" methods for PgQuery::Node, that get/set the inner object directly, avoiding the use of public_send.

This will be of particular help for anyone utilizing the walk! API to make modifications to the query tree. To illustrate, an example is added to the treewalker spec showing how to utilize this.


Additionally, this makes utilizing the walk! treewalker easier for the common use case of working with the node directly (without using the parent or location information).

lfittl commented 9 months ago

Per request (in a different channel), I've run a quick benchmark showing that this doesn't impact the time the treewalker takes negatively, which it does not:

Before (6 longest SQL statements in fingerprint test file): ``` Benchmarking time for parse+walk... user system total real 4a6db94fbada8341 52.937340 0.678991 53.616331 ( 53.624522) b7dbf54ce62af0ca 0.241984 0.002727 0.244711 ( 0.244712) 18e71bc17baea13b 0.006921 0.000028 0.006949 ( 0.006943) 3a5494404465d0f9 0.014623 0.000034 0.014657 ( 0.014658) 1cca3f304295181c 0.009612 0.000013 0.009625 ( 0.009623) f936eab75b8c1b90 0.000537 0.000001 0.000538 ( 0.000538) Benchmarking memory for parse+walk... Calculating ------------------------------------- 4a6db94fbada8341 13.806B memsize ( 0.000 retained) 49.838M objects ( 0.000 retained) 50.000 strings ( 0.000 retained) b7dbf54ce62af0ca 76.937M memsize ( 0.000 retained) 88.861k objects ( 0.000 retained) 50.000 strings ( 0.000 retained) 18e71bc17baea13b 2.268M memsize ( 0.000 retained) 10.540k objects ( 0.000 retained) 50.000 strings ( 0.000 retained) 3a5494404465d0f9 4.793M memsize ( 0.000 retained) 17.635k objects ( 0.000 retained) 50.000 strings ( 0.000 retained) 1cca3f304295181c 3.159M memsize ( 0.000 retained) 12.182k objects ( 0.000 retained) 50.000 strings ( 0.000 retained) f936eab75b8c1b90 158.198k memsize ( 0.000 retained) 926.000 objects ( 0.000 retained) 50.000 strings ( 0.000 retained) ```
After (6 longest SQL statements in fingerprint test file): ``` Benchmarking time for parse+walk... user system total real 4a6db94fbada8341 50.811444 0.857045 51.668489 ( 51.743262) b7dbf54ce62af0ca 0.241765 0.002221 0.243986 ( 0.244304) 18e71bc17baea13b 0.006827 0.000019 0.006846 ( 0.006843) 3a5494404465d0f9 0.014364 0.000009 0.014373 ( 0.014371) 1cca3f304295181c 0.009489 0.000013 0.009502 ( 0.009501) f936eab75b8c1b90 0.000513 0.000004 0.000517 ( 0.000515) Benchmarking memory for parse+walk... 4a6db94fbada8341 13.806B memsize ( 0.000 retained) 49.838M objects ( 0.000 retained) 50.000 strings ( 0.000 retained) b7dbf54ce62af0ca 76.937M memsize ( 0.000 retained) 88.862k objects ( 0.000 retained) 50.000 strings ( 0.000 retained) 18e71bc17baea13b 2.268M memsize ( 0.000 retained) 10.541k objects ( 0.000 retained) 50.000 strings ( 0.000 retained) 3a5494404465d0f9 4.793M memsize ( 0.000 retained) 17.636k objects ( 0.000 retained) 50.000 strings ( 0.000 retained) 1cca3f304295181c 3.159M memsize ( 0.000 retained) 12.183k objects ( 0.000 retained) 50.000 strings ( 0.000 retained) f936eab75b8c1b90 158.278k memsize ( 0.000 retained) 927.000 objects ( 0.000 retained) 50.000 strings ( 0.000 retained) ```
Benchmark script: ```ruby # spec/benchmark.rb # # Needs "s.add_development_dependency 'benchmark-memory', '~> 0'" in the gemspec require 'pg_query' require 'json' require 'benchmark' require 'benchmark/memory' def fingerprint_defs # Sort input SQL by size (largest first), since those would be most interesting usually @fingerprint_defs ||= JSON.parse(File.read(File.join(__dir__, 'files/fingerprint.json'))).sort_by { |d| -d['input'].size } end puts 'Benchmarking time for parse+walk...' Benchmark.bmbm do |x| fingerprint_defs.each do |testdef| x.report(testdef['expectedHash']) { PgQuery.parse(testdef['input']).walk! { |_, _, _, _| } } end end puts 'Benchmarking memory for parse+walk...' Benchmark.memory do |x| fingerprint_defs.each do |testdef| x.report(testdef['expectedHash']) { PgQuery.parse(testdef['input']).walk! { |_, _, _, _| } } end end ```