psycopg / psycopg2

PostgreSQL database adapter for the Python programming language
https://www.psycopg.org/
Other
3.37k stars 504 forks source link

`copy_from` documentation doesn't mention '\'-escaped characters #397

Closed wehlutyk closed 8 years ago

wehlutyk commented 8 years ago

First of all thanks for the great adapter :)

I was discovering the use of cursor.copy_from and the documentation doesn't have any mention of the characters PostgreSQL needs to escape when using COPY (see the table in the "File Formats" > "Text Formats" section).

This surfaced to me as a psycopg2.DataError: invalid byte sequence for encoding "UTF8": 0x80 because my input contained a backslash followed by digits (\200...), which PostgreSQL reads as an octal-encoded character. Particularly cryptic to debug as it leads one to look into encoding issues with the connection.

Would it make sense to either mention the special characters in the documention or link to the COPY documention from copy_from (not only from copy_expert)?

wehlutyk commented 8 years ago

Awesome, that'll do the the trick thanks :)

rpeys commented 5 years ago

I'm having this exact issue. Even with the documentation update, I'm not sure how to work around it. I am trying to read ascii-encoded txt files into a utf8-encoded psql table. The cur.copy_from() command throws an error when it encounters a backslash followed by digits ("\S\400"). What's the best way to handle this? Is there a better option than editing the raw data file? Will telling psycopg that the original file is encoded in ASCII help?

This is my current code:

import psycopg2
conn = psycopg2.connect(database  = 'omop' ,user = 'rp', options='-c search_path=cdm,ibc,public;')
cur = conn.cursor()

file = "myfile.txt"
f = open(file)
cur.copy_from(f, 'my_table', sep = '|')

Thank you!

dvarrazzo commented 5 years ago

@rpeys this is not a psycopg issue: you would find the same problem if you used the COPY command from psql. I can suggest you to try entering in a table the data you expect in the database and then use COPY ... TO STDOUT to see how postgres expect it to be encoded. I wouldn't be surprised if you had to massage a bit the input.

anandrajakrishnan commented 3 years ago

@wehlutyk and @dvarrazzo Can you please explain the fix? I didn't understand what to do to circumvent the case when backslash is accompanied with numerals in the file that I want to load using copy_from function. I have the exact same issue that looks like to be solved in this thread. Thanks in advance.