lib / pq

Pure Go Postgres driver for database/sql
https://pkg.go.dev/github.com/lib/pq
MIT License
8.86k stars 908 forks source link

pq.Array double quotes custom value when underlying type is []byte #1070

Closed emreisikligil closed 2 years ago

emreisikligil commented 2 years ago

When a custom type implementing database/sql/driver.Valuer has []byte as the underlying type, pq.Array treats its value as string and double quotes it while building the SQL array argument. It causes Postgresql to return invalid byte sequence for encoding "UTF8" error when the value contains an invalid UTF-8 sequence.

The following code piece is an example that causes this issue.

package main

import (
    "database/sql"
    "database/sql/driver"
    "fmt"
    "log"

    "github.com/lib/pq"
)

type MyType []byte

func (mt MyType) Value() (driver.Value, error) {
    return []byte(mt), nil
}

func main() {
    connStr := "user=test_user password=test_pass port=5441 dbname=test sslmode=disable"
    db, err := sql.Open("postgres", connStr)
    if err != nil {
        log.Fatal(err)
    }

    arr := []MyType{MyType([]byte{0x80}), MyType([]byte{0x81})}

    // CREATE TABLE temp_table(
    //  col bytea
    // )
    _, err = db.Query("SELECT col FROM temp_table WHERE col = ANY($1)", pq.Array(arr))
    fmt.Println(err)
}

When I print the value generated by pq.Array, it prints an array of strings while I would expect an array of hex codes.

fmt.Println(pq.Array(arr).Value())

// Prints: {"�","�"}
// Expected: {\\x80,\\x81}
kszafran commented 2 years ago

Isn't this the correct behavior? When using database/sql you work with a text-based connection to Postgres. So Value() should be returning a text representation, while you're returning 0x80 and 0x81 (some special chars probably). I think the double quotes are fine. If I remember correctly, both {\\x80, \\x81} and {"\\x80","\\x81"} are correct for Postgres.

emreisikligil commented 2 years ago

Double quotes are fine as long as the byte sequence is encoded properly, eg. {"\\x8081","\\x808080"}. GenericArray treats byte sequences as valid UTF-8 sequences which may not be correct. The problem starts here in my opinion. If it fallbacks to appendValue function below, it looks like it would get encoded properly.

cbandy commented 2 years ago

It's been a long time, but I'm pretty sure @kszafran is correct. Any driver.Valuer needs to return the exact correct text representation.

emreisikligil commented 2 years ago

driver.Valuer returns a driver.Value which has the following description.

// Value is a value that drivers must be able to handle.
// It is either nil, a type handled by a database driver's NamedValueChecker
// interface, or an instance of one of these types:
//
//   int64
//   float64
//   bool
//   []byte
//   string
//   time.Time

So, a Valuer can return int64, not its text representation, and the driver should handle encoding of it. In my opinion, it should be the case for []byte too. It can check if it is already a text representation of a byte sequence. If not, the driver should handle text encoding of that byte sequence.

cbandy commented 2 years ago

[]byte is the type Go uses to represent memory. This package has made the decision to interpret []byte to mean that the value is in text format and needs no further processing.

Your driver.Valuer is wrong for this driver.

I believe you can use pq.ByteaArray or pq.Array([][]byte) to get the representation you want without implementing driver.Valuer.

emreisikligil commented 2 years ago

Thanks for the info. I am closing the ticket.