sunng87 / pgwire

PostgreSQL wire protocol implemented as a rust library.
Apache License 2.0
521 stars 40 forks source link

Example of ExtendedQueryHandler #99

Open osawyerr opened 1 year ago

osawyerr commented 1 year ago

Hi there, I've been trying to get the ExtendedQueryHandler working. My implementation of the SimpleQueryHandler works fine with psql, however no client I've tried work with the ExtendedQueryHandler yet. There doesn't seem to be any example I can use. I tried to use Greptime as an example however clients like DataGrip cannot connect successfully to Greptime either.

Is it possible to provide an example of how the ExtendedQueryHandler should work please?

sunng87 commented 1 year ago

@osawyerr I have a basic example of ExtendedQueryHandler for sqlite https://github.com/sunng87/pgwire/blob/master/examples/sqlite.rs#L217 and our integration tests https://github.com/sunng87/pgwire/blob/master/tests-integration/test-server/src/main.rs#L123. But note that the abstraction of ExtendedQueryHandler is highly coupled with postgres' implementation and sqlite may not 100% fit.

Do you have any error log or message from DataGrip ?

osawyerr commented 1 year ago

Yes thanks, I just saw the test-server after I posted my issue. I'll have a look. I'll try using the ExtendedQueryHandler as a simple passthrough proxy to a real postgres instance on the backend to see if I can get a basic implementation of ExtendedQueryHandler to work. You're right alot of the SQL IDEs will try to select from from real postgres tables and call their functions, etc. In DataGrip theres no error message. It connected fine, then seemed to be sending empty SQL to the on_execute(). But that might be a bug my implementation.

portal statement is SET extra_float_digits = 3
original do_query query - SET extra_float_digits = 3
do_query query after replacement - SET extra_float_digits = 3

on_execute for ExtendedQueryHandler
portal statement is SET application_name = ''
original do_query query - SET application_name = ''
do_query query after replacement - SET application_name = ''

do_describe query is select version()
on_execute for ExtendedQueryHandler
portal statement is select version()

original do_query query - select version()
do_query query after replacement - select version()

on_execute for ExtendedQueryHandler
portal statement is SET application_name = 'CLion 2023.1.1'
original do_query query - SET application_name = 'CLion 2023.1.1'
do_query query after replacement - SET application_name = 'CLion 2023.1.1'

do_describe query is 
on_execute for ExtendedQueryHandler
portal statement is 
osawyerr commented 1 year ago

Thanks that was helpful. I've got it working with both DBeaver and DataGrip as well.

There's one thing thats a concern - this code from GrepTime for converting portal values to strings seems quite brittle since one has to manually cater for all the postgres types. It caused endless issues in our app because oids were not mapped, etc. We had to add that in. Is there a more robust way of doing this?

fn parameter_to_string(portal: &Portal<(Statement, String)>, idx: usize) -> PgWireResult<String> {
    // the index is managed from portal's parameters count so it's safe to
    // unwrap here.
    let param_type = portal.statement().parameter_types().get(idx).unwrap();
    match param_type {
        &Type::VARCHAR | &Type::TEXT => Ok(format!(
            "'{}'",
            portal.parameter::<String>(idx)?.as_deref().unwrap_or("")
        )),
        &Type::BOOL => Ok(portal
            .parameter::<bool>(idx)?
            .map(|v| v.to_string())
            .unwrap_or_else(|| "".to_owned())),
        &Type::INT4 => Ok(portal
            .parameter::<i32>(idx)?
            .map(|v| v.to_string())
            .unwrap_or_else(|| "".to_owned())),
        &Type::INT8 => Ok(portal
            .parameter::<i64>(idx)?
            .map(|v| v.to_string())
            .unwrap_or_else(|| "".to_owned())),
        &Type::FLOAT4 => Ok(portal
            .parameter::<f32>(idx)?
            .map(|v| v.to_string())
            .unwrap_or_else(|| "".to_owned())),
        &Type::FLOAT8 => Ok(portal
            .parameter::<f64>(idx)?
            .map(|v| v.to_string())
            .unwrap_or_else(|| "".to_owned())),
        _ => Err(PgWireError::UserError(Box::new(ErrorInfo::new(
            "ERROR".to_owned(),
            "22023".to_owned(),
            "unsupported_parameter_value".to_owned(),
        )))),
    }
}
sunng87 commented 1 year ago

This function is not required for your application. It's just a workaround due to limitation of our query plan implementation so we have to serialize binary types to string to reconstruct the query string.

Depends on how you implement your extended query, you may deal with parameters differently. Can you provide me more context?

osawyerr commented 1 year ago

I'm doing the same thing. I am converting the portal values from Portal::parameters() -> &Vec<Option<Bytes>> to strings so that a sql string can be executed in our backend. I was asking if there is a transparent way doing this conversion rather than having to explicitly use foo_type:

portal.parameter::<foo_type>(idx)

Hmmm thinking about it, I'm not sure what I'm asking is possible.

saeedzareian commented 1 year ago

Hello, I was trying to add pgwire as a wrapper for Duckdb using Sqlite example and had the same challenge.

The simple query method works well like what @osawyerr said and psql works. However, most of the tools that use JDBC (i.e. DataGrip/Intellij/Clion or Retool) have preferedQueryMethod=extended by default. Therefore, I think we need solve this if it hasn't already.

Let me show you my findings using Clion IDE interactions with our example code:

login info: LoginInfo { user: Some("saeed"), database: Some("saeed"), host: "127.0.0.1" }

extended query "SET extra_float_digits = 3"
ignoring query: "SET extra_float_digits = 3"
extended query "SET application_name = ''"
ignoring query: "SET application_name = ''"

do_describe Portal(Portal { name: "POSTGRESQL_DEFAULT_NAME", statement: StoredStatement { id: "POSTGRESQL_DEFAULT_NAME", statement: "select version()", parameter_types: [] }, parameter_format: UnifiedText, parameters: [], result_column_format: UnifiedText })
extended query "select version()"

extended query "SET application_name = 'CLion 2023.1.3'"
ignoring query: "SET application_name = 'CLion 2023.1.3'"

do_describe Portal(Portal { name: "POSTGRESQL_DEFAULT_NAME", statement: StoredStatement { id: "POSTGRESQL_DEFAULT_NAME", statement: "", parameter_types: [] }, parameter_format: UnifiedText, parameters: [], result_column_format: UnifiedText })
extended query ""
empty query, ignoring

As far as I understood, it seems that Clion (my IDE) is asking to reset the stored statement called POSTGRESQL_DEFAULT_NAME and that could be why it sends empty query. If you notice about the above, it also tried to set application_name as empty string at first, too. My other theory is that it is just default value and sending the same default name/id all the time.

I am not sure what is the best response to return. I tried OK 1 , OK 0 and successful empty query and it didn't work yet. I need to find a reference of the protocol to handle this case better. The Sqlite example doesn't work with IDEs too.

PS: I ignored SET commands to handle them later as per my application logic later.

sunng87 commented 1 year ago

@saeedzareian for extended query, we have them supported as in wire protocol layer. However, you will need to deal with SQL at the moment.

To find a right response for SET statement, you can use wireshark to capture how Postgres server responds it. Please let me know if you have the answer.

saeedzareian commented 1 year ago

@saeedzareian for extended query, we have them supported as in wire protocol layer. However, you will need to deal with SQL at the moment.

Yes. I agree that it is supported in the protocol but there is that second empty extended query that is the challenge. If you see @osawyerr had the same issue on his July 2 comment.

No problems with SET for now. Let's ignore it for now.

The connection from JDBC connectors fails at the beginning right after the empty query. No sure why and what is the solution.

saeedzareian commented 1 year ago

I asked Chatgpt to tell me what are the steps in pgjdbc when using the ExtendedQuery. This is what it gave me:

The PostgreSQL JDBC driver (pgjdbc) can operate in two modes: simple query and extended query. Extended query mode is generally more efficient and flexible, as it allows for prepared statements and parameterized queries. When you use prepared statements in Java, pgjdbc switches to extended query mode.

The sequence of packets sent by the client (pgjdbc in this case) to the PostgreSQL server and vice versa during extended query execution is typically as follows:

Parse Message (P): This message is sent by the client to the server to prepare a statement for execution. It includes the query string, possibly with parameter placeholders.

PreparedStatement preparedStatement = connection.prepareStatement("SELECT * FROM table WHERE column = ?");

Parse Complete Message (1): This message is sent from the server back to the client to indicate that the statement has been successfully parsed and is ready for execution.

Bind Message (B): This message is sent by the client to the server to bind parameter values to a prepared statement.

preparedStatement.setString(1, "value");
Bind Complete Message (2): This message is sent from the server back to the client to indicate that binding is complete and the statement is ready for execution.

Describe Message (D) (Optional): This message may be sent by the client to get information about the result set returned by the query. This step is optional.

Row Description Message (T): If the Describe message was sent, this message is sent from the server back to the client, describing the format of the result set.

Execute Message (E): This message is sent by the client to execute the prepared and bound statement.

ResultSet resultSet = preparedStatement.executeQuery();

Data Row Messages (D): These messages are sent from the server to the client, containing the rows that satisfy the query. These messages continue until all rows are sent.

Command Complete Message (C): This message is sent from the server to indicate that the query has been successfully executed.

Ready for Query Message (Z): This message is sent from the server to indicate that it's ready for a new query cycle.

Sync Message (S): This message is usually sent by the client after each series of extended-query messages to synchronize the state of the connection and to confirm that everything went as expected.

Here's an example Java code snippet using pgjdbc to perform an extended query:

Copy code
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;

public class ExtendedQueryExample {
    public static void main(String[] args) {
        try {
            // Initialize connection
            Connection connection = DriverManager.getConnection(
                    "jdbc:postgresql://localhost:5432/mydatabase",
                    "username",
                    "password"
            );

            // Prepare statement (Parse)
            PreparedStatement preparedStatement = connection.prepareStatement(
                    "SELECT * FROM mytable WHERE mycolumn = ?"
            );

            // Bind parameters (Bind)
            preparedStatement.setString(1, "some_value");

            // Execute query (Execute)
            ResultSet resultSet = preparedStatement.executeQuery();

            // Process results
            while (resultSet.next()) {
                System.out.println("Column value: " + resultSet.getString("mycolumn"));
            }

            // Close resources
            resultSet.close();
            preparedStatement.close();
            connection.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

}

Note: The sequence and message details are simplified for the sake of understanding, and real-world interactions might involve more steps or nuances.


For my case, I think guess there is something up with the SET. I have used some breakpoints in the library and the disconnect happens where I have SET application name and that's why the last one is empty query. I will update here as soon as I learn more.