jackc / pgx

PostgreSQL driver and toolkit for Go
MIT License
10.83k stars 845 forks source link

PGX seems not calling the Scanner interface with JSONB column #2146

Closed ludusrusso closed 1 month ago

ludusrusso commented 1 month ago

Describe the bug When implementing scanner and valuer interface of a type used with pgx, it seems that scanner interface is not used to read data in case of JSONB columns.

It is called when the column is not JSONB, I've tried with a bytea/varchar columns and it works perfectly.

To Reproduce

I've created the following repo to reproduce the issue: https://github.com/ludusrusso/pgx-protobuf

As you can see if you run go run main.go, the program panics with

INFO[0000] calling valuer                               
FATA[0000] can't get tests: can't scan into dest[1]: json: cannot unmarshal string into Go struct field Test.timestamp of type timestamppb.Timestamp 
exit status 1

The problem is due to the fact that pgx is tring to unmarshal test with the json packages, while the struct has been marshalled with protojson.

As you can see from logs, the Scan method is never called.

func (t *Test) Scan(value interface{}) error {
    logrus.Info("calling scanner")

    bytes, ok := value.([]byte)
    if !ok {
        return errors.New("type assertion to []byte failed")
    }

    return protojson.Unmarshal(bytes, t)
}

func (t *Test) Value() (driver.Value, error) {
    logrus.Info("calling valuer")

    return protojson.Marshal(t)
}

Expected behavior Scan method should be called.

Actual behavior Scan method is not called.

Version

jackc commented 1 month ago

Any chance you can reduce the example a bit? This case should be working. I suspect the issue is some misinteraction between the protobuf type and sqlc, but I don't use either of those so your example is rather opaque to me.

ludusrusso commented 1 month ago

Hi @jackc, thanks for your reply, I've done some investigation. Here is a reproduction of the same problem without both protobuf and sqlc.

package main

import (
    "context"
    "database/sql/driver"
    "encoding/json"
    "fmt"
    "time"

    "github.com/jackc/pgx/v5/pgxpool"
    "github.com/sirupsen/logrus"
)

func main() {
    db, err := pgxpool.New(context.Background(), "postgres://postgres:postgres@localhost:5432/pgx-proto")
    if err != nil {
        logrus.Fatalf("can't connect to db: %v", err)
    }

    defer db.Close()

    write(db)
    read(db)

    return
}

func write(db *pgxpool.Pool) {
    // create a table
    _, err := db.Exec(context.Background(), `CREATE TABLE IF NOT EXISTS tests (
        id SERIAL PRIMARY KEY,
        data JSONB
    )`)
    if err != nil {
        logrus.Fatalf("can't create table: %v", err)
    }

    // insert a row using a custom type
    _, err = db.Exec(context.Background(), `INSERT INTO tests (data) VALUES ($1)`, NewTimestamp(time.Now()))
    if err != nil {
        logrus.Fatalf("can't insert row: %v", err)
    }
}

func read(db *pgxpool.Pool) {
    // select multiple rows
    rows, err := db.Query(context.Background(), `SELECT * FROM tests`)
    if err != nil {
        logrus.Fatalf("can't select rows: %v", err)
    }
    defer rows.Close()

    for rows.Next() {
        var r Row
        err := rows.Scan(&r.ID, &r.Data)
        if err != nil {
            logrus.Fatalf("can't scan row: %v", err)
        }
        logrus.Infof("timestamp: %v", r.Data.Time())
    }

    if err := rows.Err(); err != nil {
        logrus.Fatalf("rows error: %v", err)
    }
}

type Row struct {
    ID   int        `json:"id"`
    Data *Timestamp `json:"data"` // <- this is a pointer
}

type Timestamp struct {
    Seconds int64 `json:"seconds"`
}

func NewTimestamp(t time.Time) Timestamp {
    return Timestamp{
        Seconds: t.Unix(),
    }
}

func (t Timestamp) Time() time.Time {
    return time.Unix(t.Seconds, 0)
}

func (t *Timestamp) Scan(value interface{}) error {
    fmt.Println("calling scanner")

    var valBytes []byte

    switch v := value.(type) {
    case []byte:
        valBytes = v
    case string:
        valBytes = []byte(v)
    default:
        return fmt.Errorf("can't convert %T to Test", value)
    }

    var tm time.Time
    err := json.Unmarshal(valBytes, &tm)
    if err != nil {
        return fmt.Errorf("can't unmarshal %s to time.Time: %v", valBytes, err)
    }

    *t = NewTimestamp(tm)
    return nil
}

func (t Timestamp) Value() (driver.Value, error) {
    fmt.Println("calling valuer")
    return json.Marshal(t.Time())
}

And here is the output I get in the same environment described above:

calling valuer
FATA[0000] can't scan row: can't scan into dest[1]: json: cannot unmarshal string into Go value of type main.Timestamp 
exit status 1

Some context

The issue seems releted to the fact that r.Data is a pointer, and it seems that rows.Scan(&r.ID, &r.Data) is not able to detect the Scanner interface of &r.Data that is of time **Timestamp.

Anyway, if you replace the JSON column with a bytea or a varchar column, all works perfectly. As following

    _, err := db.Exec(context.Background(), `CREATE TABLE IF NOT EXISTS tests (
        id SERIAL PRIMARY KEY,
        data VARCHAR
    )`)

Is this behavior intentional? In case you confirm this is not correct, I'd like to try to propose a PR to fix this!

Thanks!

jackc commented 1 month ago

Is this behavior intentional? In case you confirm this is not correct, I'd like to try to propose a PR to fix this!

It's an unintended side effect of automatic JSON unmarshalling of json/jsonb. pgx checks for sql.Scanner first, but in this case the target does not implement sql.Scanner. So the JSON unmarshalling path is chosen.

I suppose the solution would be to not just check for sql.Scanner but also if there is any way to get to a sql.Scanner before proceeding with JSON unmarshalling. JSONCodec.PlanScan is the place to start if you do want look into it. But this code is already pretty difficult to reason about.

ludusrusso commented 1 month ago

I see! It seems to me that the best approach is to try to scan recursively if no type are matched until we found a .Scan interface, like stdlib does here:

https://github.com/golang/go/blob/b521ebb55a9b26c8824b219376c7f91f7cda6ec2/src/database/sql/convert.go#L432-L439

This also should handle this case: https://github.com/jackc/pgx/blob/2ec900454bfe65daa9648488e93f7627c26b810c/pgtype/json.go#L125-L142

But, as I understand the code, it requires a little bit of refactoring in order to achieve this.

jackc commented 1 month ago

Something like that anyway. It's just hard to reason about all the possible ways scanning can take place. We don't want to break any existing use cases.

ludusrusso commented 1 month ago

I agree, I'll try to fix this without a big refactoring in order to avoid introducing new issues. Give me some days to try to submit a PR!