mplourde / rpostgresql

Automatically exported from code.google.com/p/rpostgresql
0 stars 1 forks source link

Encoding of strings is not set properly by dbGetQuery() #52

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Configure PostgreSQL client and server to use UTF-8 encoding (this is the 
default).  To verify, 
    dbGetQuery(con_master, "SHOW server_encoding");
       # "UTF-8"
    dbGetQuery(con_master, "SHOW client_encoding");
       # "UTF-8"

2. Create a UTF-8 string.  One method is to start with a latin1 string created 
in Microsoft Word, then converted to UTF-8 via iconv():
    a <- "Test UTF-8: ±€£¥©®™≠≤≥÷×∞µαβπΩ∑"
    aa <- iconv(a, from="latin1", to="UTF-8");
    Encoding(aa);
        # "UTF-8"

3. Write this string to a PostgreSQL table using dbGetQuery or dbWriteTable.  
For example:
    b <- data.frame(col1=c("simple string", aa), col2=c(1,2), stringsAsFactors=FALSE);
    Encoding(b$col1)
        # [1] "unknown" "UTF-8"  
    dbWriteTable(con, "junk", b, overwrite=TRUE, append=FALSE);

4. Read the table back using dbGetQuery for example:
    c <- dbGetQuery(con, "SELECT * FROM junk");

If you check the encoding you will see that it is set to "unknown"
    Encoding(c);
        # [1] "unknown" "unknown"

Note that this is incorrect - the second string should have been "UTF-8"

To fix this, you have to explicitly set the encoding on every string returned 
by every PostgreSQL command that returns strings from the database.  In the 
above example you must follow the dbGetQuery() with:

    Encoding(c$col1) <- "UTF-8"

Now the data.frame will be processed correctly.

What version of the product are you using? On what operating system?

Windows 7, R 2.9.2, RPostgreSQL 0.4, working inside Eclipse IDE 4.2.1 with 
StatET

Please provide any additional information below.

Original issue reported on code.google.com by bmusi...@aptecgroup.com on 24 May 2013 at 12:00