larskanis / importtest

from bitbucket
0 stars 0 forks source link

PG::TextEncoder::Array wrong encode for UTF-8 text arrays #252

Closed larskanis closed 8 years ago

larskanis commented 8 years ago

Original report by Ilya Shavrin (Bitbucket: [Ilia Shavrin](https://bitbucket.org/Ilia Shavrin), ).


Hi,

I've noticed wrong serialization result for UTF-8 string arrays. It could lead to incorrect behavior in other sides.

2.3.1 :009 > PG::TextEncoder::Array.new(name: "text[]", delimiter: ',').encode(["a", "š"] ) => "{a,\xC5\xA1}"

#!ruby

#!/usr/bin/env ruby

require 'pg'
Encoding.default_external = Encoding::UTF_8

decoder = PG::TextDecoder::Array.new(name: "text[]", delimiter: ',')
encoder = PG::TextEncoder::Array.new(name: "text[]", delimiter: ',')

arr = ["a", "š"]
arr2 = decoder.decode encoder.encode(arr)

p "Should be equal: #{arr == arr2}"
larskanis commented 8 years ago

Original comment by Lars Kanis (Bitbucket: larskanis, GitHub: larskanis).


Duplicate of #230.

larskanis commented 8 years ago

Original comment by Lars Kanis (Bitbucket: larskanis, GitHub: larskanis).


Ruby strings store their encoding specific to each single string object. On the other hand, the client encoding is per connection. Per default PG::TextEncoder::Array#encode encodes strings to their binary representation as Encoding::BINARY:

decoder.decode encoder.encode(["ä".encode("ibm850"), "ä".encode("iso-8859-1")])  # => ["\x84", "\xE4"]

Alternatively you can request a specific output encoding (typically the client encoding of the connection) as second argument to #encode . All input strings will be converted accordingly:

decoder.decode encoder.encode(["ä".encode("ibm850"), "ä".encode("iso-8859-1")], Encoding::UTF_8) # => ["ä", "ä"] 

I hope this helps.

larskanis commented 8 years ago

Original comment by Ilya Shavrin (Bitbucket: [Ilia Shavrin](https://bitbucket.org/Ilia Shavrin), ).


Thanks for the link. I've missed it somehow.