timescale / timescaledb-parallel-copy

A binary for parallel copying of CSV data into a TimescaleDB hypertable
https://www.timescale.com/
Apache License 2.0
370 stars 55 forks source link

bytea fields are wrongly handled #61

Closed lservini closed 2 years ago

lservini commented 2 years ago

From case 8581,

However, we discovered a bug, which prevents us from using it. The bug is related to incorrect values inserted into
bytea type columns. We have "hash" column in transactions table with a bytea type. I created a CSV export from 
Postgres DB. For instance in Postgres hash is "\x160f2b057728502c17b0f7d5883a94b2dcc5c8be013cf3b81c0fe6ddd6b77050",
but when it is inserted into hypertable using timescaledb-parallel-copy utility, actually what is inserted is 
hex(hash). So, after the import with the utility, I see in hypertable
`\x5c7831363066326230353737323835303263313762306637643538383361393462326463633563386265303133636633623831633066653664646436623737303530` 
which actually is hex('\x160f2b057728502c17b0f7d5883a94b2dcc5c8be013cf3b81c0fe6ddd6b77050').
jchampio commented 2 years ago

Looks like lib/pq's COPY support is limited to text input, which it always escapes. There was a raw CopyData API added by Cockroach but no tests, so I don't really know how public clients would actually use it. lib/pq is in maintenance mode and doesn't appear to be very healthy. :grimacing:

jchampio commented 2 years ago

Should be fixed by #63; please reopen and tag me if you find otherwise.