lib / pq

Pure Go Postgres driver for database/sql
https://pkg.go.dev/github.com/lib/pq
MIT License
9.12k stars 911 forks source link

NULL handling for bulk COPY operations #591

Open mandolyte opened 7 years ago

mandolyte commented 7 years ago

I am bulk loading large CSV files, but have not found a way to indicate how to treat nulls. I am working around the problem by creating my target tables using all TEXT data types. But for dates, if the input is empty, it errs since it isn't properly formatted as a date.

I have searched the issues and documentation and haven't found a reference to how to set the PG option NULL 'null_string' (from https://www.postgresql.org/docs/9.2/static/sql-copy.html). I'm probably missing something obvious. If I am, treat this as a suggestion to update the godocs.

Thanks!

maddyblue commented 7 years ago

The docs there say

The default is \N (backslash-N) in text format, and an unquoted empty string in CSV format.

Have you tried \N?

mandolyte commented 7 years ago

Yup... details (note, the database is actually Greenplum, a PG fork of around 8.2): The table to test:

create table geagp_des.testdates (
id integer,
datex timestamp
)
distributed by(ID);

The CSV file to load:

id,datex
1,2017-01-01
2,2016-12-25
3,2017-09-01
4,\N
5,2017-04-01
$

Finally script to run:

$ sh testdates.sh
Start Time: 2017-03-24 16:58:06.5622126 -0400 EDT
Changing all headers to lowercase!
2017/03/24 16:58:06 pq: invalid input syntax for type timestamp: "\N"  (seg25 avw7gpdbsdwq4.gphd.local:1026 pid=143359)
exit status 1
$ 

If I run with it just empty:

$ sh testdates.sh
Start Time: 2017-03-24 16:57:45.2060772 -0400 EDT
Changing all headers to lowercase!
2017/03/24 16:57:45 pq: invalid input syntax for type timestamp: ""  (seg25 avw7gpdbsdwq4.gphd.local:1026 pid=141741)
exit status 1
$

If it helps, here is the code snippet I'm using:

    // read loop for CSV
    var row int64
    for {
        // read the csv file
        cells, rerr := r.Read()
        if rerr == io.EOF {
            break
        }
        if rerr != nil {
            log.Fatal("r.Read() Error:" + rerr.Error())
        }

        args := make([]interface{}, len(hdrs))
        for n := range cells {
            args[n] = &cells[n]
        }

        _, err = stmt.Exec(args...)
        if err != nil {
            log.Fatalf("Error at row %v is:\n%v\nArgs:%v", row, err, cells)
        }
        row++
    }
FdeFabricio commented 7 years ago

I had a similar issue. In my case, I had to explicit set the arg as nil. Something like this:

args := make([]interface{}, len(hdrs))
for n := range cells {
    if len(cells[n]) == 0 {
        args[n] = nil
    } else {
        args[n] = &cells[n]
    }
}
julienmathevet commented 6 years ago

same error: pq: invalid input syntax for type numeric: "\N"

julienmathevet commented 6 years ago

workaround: add to copy query WITH NULL 'NULL' and set NULL instead of \N