will / crystal-pg

a postgres driver for crystal
BSD 3-Clause "New" or "Revised" License
462 stars 77 forks source link

Exception sending query with bytea[] binary array type? #267

Open compumike opened 1 year ago

compumike commented 1 year ago

I ran into an Unhandled exception: invalid byte sequence for encoding "UTF8": 0x00 (PQ::PQError) when trying to do a query that involves the postgres BYTEA binary type. Any idea why it is trying to encode this as UTF8?

Simplified example code:

array_of_binaries = Array(Bytes?).new
array_of_binaries << "Hello".to_slice
array_of_binaries << nil
array_of_binaries << "world".to_slice
array_of_binaries << Bytes[0, 255] # <=== This line breaks the query.

my_db.query("SELECT * FROM UNNEST($1::bytea[])", args: [array_of_binaries]) do |rs|
  rs.each do
    puts rs.read(Bytes?)
  end
end

As shown, I get this backtrace:

Unhandled exception: invalid byte sequence for encoding "UTF8": 0x00 (PQ::PQError)
  from lib/pg/src/pq/connection.cr:215:7 in 'handle_error'
  from lib/pg/src/pq/connection.cr:198:7 in 'handle_async_frames'
  from lib/pg/src/pq/connection.cr:174:7 in 'read'
  from lib/pg/src/pq/connection.cr:169:7 in 'read'
  from lib/pg/src/pq/connection.cr:447:31 in 'expect_frame'
  from lib/pg/src/pq/connection.cr:446:5 in 'expect_frame'
  from lib/pg/src/pg/statement.cr:19:5 in 'perform_query'
  from lib/db/src/db/statement.cr:93:9 in 'perform_query_with_rescue'
  from lib/db/src/db/statement.cr:80:7 in 'query:args'
  from lib/db/src/db/pool_statement.cr:29:30 in 'query:args'
  from lib/db/src/db/query_methods.cr:46:7 in 'query:args'
  from lib/db/src/db/query_methods.cr:61:7 in '__crystal_main'

If I comment out the line with the Bytes[0, 255], then it works fine:

Bytes[72, 101, 108, 108, 111]

Bytes[119, 111, 114, 108, 100]

Any ideas on where I can look to fix this? Postgres bytea docs suggest it can handle arbitrary binary data, so I suspect an issue on the crystal-pg side, but am not sure where to look. Thank you for any pointers.

will commented 1 year ago

So I think the problem is that in this example it's not just bytea types but arrays of bytea, and before arrays get sent up to postgres they get encoded as strings

    def self.encode_array(array)
      String.build(array.size + 2) do |io|
        encode_array(io, array)
      end
    end

and postgres doesn't like 0x00 in a string. It might be possible (but I haven't thought it through at all yet, so I don't know) to special case arrays of bytea and send them with the binary protocol instead of text.

compumike commented 1 year ago

Thank you so much @will! That was a very helpful pointer.

As a result, I found the following workaround by assuming that crystal-pg sends a string-encoded binary, and so I can pre-encode my data using the bytea hex format.

This worked for me:

array_of_binaries = Array(Bytes?).new
array_of_binaries << "Hello".to_slice
array_of_binaries << nil
array_of_binaries << "world".to_slice
array_of_binaries << Bytes[0, 255] # <=== This line breaks the query.

array_of_pg_hex_strings : Array(String?) = array_of_binaries.map do |bytes|
  next nil if bytes.nil?

  String.build do |str|
    str << "\\x"
    bytes.each do |byte|
      str << sprintf("%02x", byte)
    end
  end
end

my_db.query("SELECT * FROM UNNEST($1::bytea[])", args: [array_of_pg_hex_strings]) do |rs|
  rs.each do
    puts rs.read(Bytes?)
  end
end

resulting in:

Bytes[72, 101, 108, 108, 111]

Bytes[119, 111, 114, 108, 100]
Bytes[0, 255]

as expected.