jackc / pgx

PostgreSQL driver and toolkit for Go
MIT License
10.83k stars 846 forks source link

pgtype.ArrayCodec should support non-comma delimiter #2086

Open wchargin opened 4 months ago

wchargin commented 4 months ago

Is your feature request related to a problem? Please describe.

For many Postgres types, array elements are separated by commas:

$ psql -c "SELECT array[array[g,g],array[g,g]] FROM (VALUES('g'::text)) g(g)"
     array     
---------------
 {{g,g},{g,g}}
(1 row)

However, some types use a different delimiter. In particular, PostGIS geometry and geography types use a colon as the delimiter:

$ psql -c "SELECT array[array[g,g],array[g,g]] FROM (VALUES('GEOMETRYCOLLECTION EMPTY'::geometry)) g(g)"
                                       array                                       
-----------------------------------------------------------------------------------
 {{010700000000000000:010700000000000000}:{010700000000000000:010700000000000000}}
(1 row)

(Note that, in this multidimensional array, the delimiter separates both the base elements and the constituent 1D arrays.)

This is controlled by the DELIMITER option to CREATE TYPE: see, e.g., CREATE TYPE geometry (..., delimiter = ':').

I am using twpayne/pgx-geom to encode and decode my PostGIS values with pgx. So far, this only supports scalars. I see that pgtype's ArrayCodec should make it nice and easy to support arrays, too, and indeed the obvious approach works out of the box for the binary format, but for the text format it fails with "can't scan into dest[0]: encoding/hex: invalid byte: U+003A ':'". This is because ArrayCodec gives the whole 0107...:0107... string to the element codec, even though that represents two values.

Describe the solution you'd like

Perhaps pgtype.Type could have a new public field like Delimiter string, such that the empty string (default) means "use comma as delimiter" but any other string would be used as the delimiter instead. Then, ArrayCodec could read this value from the ElementType *pgtype.Type to encode and decode. If I understand correctly, this should be a backward-compatible change.

Describe alternatives you've considered

Alternately, the Delimiter field could exist on the ArrayCodec itself. In some ways, that seems more natural, since it affects only the array parsing, but it would deviate from Postgres's ontology, where (quoting the same CREATE TYPE docs) "the delimiter is associated with the array element type, not the array type itself".

I have of course also considered hand-rolling the parsers and encoders. It's not too bad, but I'd probably end up with a weird half-and-half solution using the ArrayCodec implementation for the binary format but a custom implementation for the text format. Workable, but certainly a shame when all that's needed is a single character to be handled differently!

Additional context

None comes to mind.

jackc commented 4 months ago

This seems like a good idea to me.

Not sure on whether this configuration should go on pgtype.Type or pgtype.ArrayCodec.

Either way, the pgx.Conn.LoadType and pgx.Conn.LoadTypes methods should also be updated to load this value.

wchargin commented 4 months ago

Good point about LoadType. That made me think to check what other types have typdelim <> ',', and it seems that the standard type box exhibits this bug in pgx without any external dependency on PostGIS:

err := pool.AcquireFunc(ctx, func(conn *pgxpool.Conn) error {
    formats := []struct{code int16; name string}{
        {code: pgx.BinaryFormatCode, name: "binary"},
        {code: pgx.TextFormatCode, name: "text"},
    }
    for _, format := range formats {
        rows, err := conn.Query(ctx, "WITH g(g) AS (VALUES('((1,1),(2,2))'::box)) SELECT array[array[g,g],array[g,g]] FROM g", pgx.QueryResultFormats{format.code})
        if err != nil {
            return err
        }
        for rows.Next() {
            var dst [][]pgtype.Box
            if err := rows.Scan(&dst); err != nil {
                return fmt.Errorf("format=%v: failed to scan: %w", format.name, err)
            }
            fmt.Printf("format=%v: row: %T %v\n", format.name, dst, dst)
        }
    }
    return nil
})
fmt.Printf("AcquireFunc error: %v\n", err)
format=binary: row: [][]pgtype.Box [[{[{2 2} {1 1}] true} {[{2 2} {1 1}] true}] [{[{2 2} {1 1}] true} {[{2 2} {1 1}] true}]]
AcquireFunc error: format=text: failed to scan: can't scan into dest[0]: unexpected trailing data: }

(On my system, it is just box, _box, geometry, _geometry, geography, and _geography that have non-comma delimiters.)