lib / pq

Pure Go Postgres driver for database/sql
https://pkg.go.dev/github.com/lib/pq
MIT License
8.86k stars 908 forks source link

Using json with `CopyIn()`: `invalid input syntax for type json` #1104

Open sakishrist opened 1 year ago

sakishrist commented 1 year ago

Description

When I have a struct that gets encoded to json before being sent with COPY to postgres, the encoded part ends up looking like this: \\x5b7b22.... The server then returns an error: invalid input syntax for type json.

Initially what triggered the issue was an upgrade of pq from 1.7.0 to 1.10.6. Testing different commits, it seemed this change one to be the trigger, but that is not the root cause: https://github.com/lib/pq/pull/979

And so far I've got to this part of the code which is the one producing the double backslash: https://github.com/lib/pq/blob/master/encode.go#L593

But I don't know if this is a problem with my postgres server (version 14.2) or with the encoding or even somewhere entirely different.

What the data looks like in Go

The data represented in Go looks like this:

    type Item struct {
        ItemID string `json:"id" validate:"required" example:"26142701"`
        AddedAt int64 `json:"added_at" validate:"required" example:"1541599146000"`
    }
    type Items []Item

    items := Items{
        {
            ItemID: "5d7825b2-78bb-11ea-be60-d7d8e4e7f725",
        },
        {
            ItemID: "5d7825b2-78bb-11ea-be60-d7d8e4e7f726",
        },
        {
            ItemID: "5d7825b2-78bb-11ea-be60-d7d8e4e7f727",
        },
    }

    row := []interface{}{
        "user-id",
        "5d7825b2-78bb-11ea-be60-d7d8e4e7f720",
        "playlist-title",
        items,
    }

So items is being json encoded since the field in the database for it is of type jsonb.

(there's Value() and Scan() for the struct, just skipped them here for brevity)

The error in Go

Then I just issue a

stmt, err := tx.Prepare(pq.CopyIn(tableName, columns...))
stmt.Exec(row...)

and I get the error

{
  "Severity": "ERROR",
  "Code": "22P02",
  "Message": "invalid input syntax for type json",
  "Detail": "Token \"\\\" is invalid.",
  "Hint": "",
  "Position": "",
  "InternalPosition": "",
  "InternalQuery": "",
  "Where": "JSON data, line 1: \\...\nCOPY personal_playlists, line 1, column items: \"\\x5b7b226964223a2235643738323562322d373862622d313165612d626536302d643764386534653766373235222c226164...\"",
  "Schema": "",
  "Table": "",
  "Column": "",
  "DataTypeName": "",
  "Constraint": "",
  "File": "jsonfuncs.c",
  "Line": "621",
  "Routine": "json_ereport_error"
}

Better view of the communication with Wireshark

The data sent to the server looks like this: image

Once the copy complete command is executed the server returns this:

image

And I thought the problem is the double backslash and made a change to remove it, but then the server seemed to parse the first two hex digits, but was then complaining for the next sequence:

image

Conclusion?

That leads me to believe the server wants me to send each byte hex escaped..

Any idea what might be going on? Is the double backslash \\x... how it's supposed to work.