pganalyze / pg_query_go

Go library to parse and normalize SQL queries using the PostgreSQL query parser
BSD 3-Clause "New" or "Revised" License
663 stars 79 forks source link

How to identify whether a SQL is read-only or not? #60

Closed FX-HAO closed 2 years ago

FX-HAO commented 2 years ago

I noticed there is a C function CommandIsReadOnly and the implementation is in Postgres repo as below https://github.com/postgres/postgres/blob/bde60daa0ec9d0ee8dc79c0caa2c94d128527867/src/backend/tcop/utility.c#L100-L127

But I don't figure out how to call it in Golang. I think I probably need to make a little bit code. Could you please point me to the right direction? thanks

sql := "SELECT 1"
result, err := pg_query.Parse(sql)
if err != nil {
    panic(err);
}
for _, stmt := range result.Statements {
    isReadOnly := pg_query.CommandIsReadOnly(stmt) # this doesn't work
}
lfittl commented 2 years ago

That function assumes a planned statement, however pg_query operates on pre-parse analysis parse trees, so it couldn't be called even if you were to write a Go wrapper around it.

My recommendation would be to write similar logic to what Postgres is doing internally, to detect the cases where SELECT statements are likely to be modifying (i.e. when they are a CTE with DML). Note that SELECT statements can call functions that then make modifications, and that is something you could not detect from a simple parse tree.

Out of curiosity, what's your use case? I wonder if there is a more specific solution to the problem at hand.

FX-HAO commented 2 years ago

Thanks for your suggestion. Our use case is that we want to implement a perceived zero-downtime database migration plan. We can use CDC to migrate the data from the old database to the new one. but during the switching over, we need to allow read-only access to the old database while the CDC process completed (by comparing LSN). So one of the requirements is that we need to identify if a SQL is write or read-only.

lfittl commented 2 years ago

Thanks for your suggestion. Our use case is that we want to implement a perceived zero-downtime database migration plan. We can use CDC to migrate the data from the old database to the new one. but during the switching over, we need to allow read-only access to the old database while the CDC process completed (by comparing LSN). So one of the requirements is that we need to identify if a SQL is write or read-only.

Ah, that makes sense. Thanks for the context!

I do think that will be difficult to do 100% reliable on the basis of the query text alone (as mentioned earlier), but if you can be reasonably certain that you don't have functions that make writes, you could essentially re-implement what the CommandIsReadOnly logic in Postgres tries to do.

It might be a bit easier to prototype this in Ruby by the way (i.e. with https://github.com/pganalyze/pg_query), the Go library does tend to be a bit more complicated to experiment with, just by the nature of having a compile cycle between testing different logic.

For looking at the query tree recursively, you could take some inspiration from what the Ruby library does for implementing the tables function: https://github.com/pganalyze/pg_query/blob/main/lib/pg_query/parse.rb#L98

I unfortunately don't have the time to help on an implementation here, but its essentially just a coding problem based on the nested structs and detecting their types / setting a flag based on whether you find anything resembling a DML operation within the query tree.