ossc-db / pg_bulkload

High speed data loading utility for PostgreSQL
http://ossc-db.github.io/pg_bulkload/index.html
Other
433 stars 75 forks source link

Error in `tuplesort_performsort` #44

Open ashiklom opened 7 years ago

ashiklom commented 7 years ago

I'm using pg_bulkload to import a bunch of large (3-5 million row x 4 column) CSV files into a PostgreSQL database. The first two import very quickly and without issue, but subsequent ones fail with the following error:

ERROR: copy failed: ERROR:  faild in tuplesort_performsort

For reference, my control file looks like this:

OUTPUT = spectra_data
TYPE = CSV
DELIMITER = ","

...and the command I'm running looks like this.

pg_bulkload -d leaf_spectra control.ctl -i stdin < data.csv

(Actually, the command and files are generated automatically by an R system call...but that's effectively what it is).

The table I'm trying to import into looks like this:

                                  Table "public.spectra_data"                                                                                                         [0/1862]
    Column     |  Type   |                              Modifiers
---------------+---------+----------------------------------------------------------------------
 spectradataid | bigint  | not null default nextval('spectra_data_spectradataid_seq'::regclass)
 spectraid     | bigint  | not null
 wavelength    | numeric |
 spectravalue  | numeric |
Indexes:
    "spectra_data_pkey" PRIMARY KEY, btree (spectradataid)
Check constraints:
    "legal_wavelength" CHECK (wavelength > 0::numeric)
Foreign-key constraints:
    "spectra_data_spectraid_fkey" FOREIGN KEY (spectraid) REFERENCES spectra_info(spectraid) ON DELETE CASCADE

Any idea why?

FYI, I'm running Postgres version 9.6.1 (latest) on Arch Linux, with an install of pg_bulkload from the master branch earlier today (Feb 7, 2017).

bwtakacy commented 7 years ago

Hmm, pg_bulkload came across the error in index rebuilding in the end of data loading stage.

Could you give us data files which can reproduce this?

arshad88 commented 7 years ago

Hello Ppl , im also facing the same issue?..is there any other closed post for this issue?

silvianacmp commented 7 years ago

Hey, any update on this issue? I am facing it as well. The first time I ran bulkload the import was successful. Subsequently, I deleted all the records inserted in the table and retried the same load. The second time I go this error.

silvianacmp commented 7 years ago

I've managed to get rid of the error in the meantime. In my case, the problem was that I deleted the records simply using DELETE FROM, which does not physically remove them. After running a VACUUM FULL command on the table it worked.

KeizoUehara commented 6 years ago

following procedure can reproduce this error.

  1. create random 30 charactor * 200000 lines data
    
    import sys
    import random

source_str = 'abcdefhijklmnopqrstuvwxyz0123456789' random.seed(1) num = int(sys.argv[1]) for i in range(num): print("".join([random.choice(source_str) for x in xrange(30)]))


2. set maintenance_work_mem to 5MB
3. create table

psql testdb -c "CREATE TABLE test(id TEXT PRIMARY KEY);"


4.execute pg_bulkload

cat test.csv | /usr/pgsql-9.6/bin/pg_bulkload -d testdb test.ctl

control file

INPUT = stdin OUTPUT = public.test TYPE = CSV QUOTE = "\"" ESCAPE = \ DELIMITER = "," NULL = NULL DUPLICATE_ERRORS = INFINITE ON_DUPLICATE_KEEP = NEW



pg_bulkload finished successfully when maintenance_work_mem is 20MB.
KeizoUehara commented 6 years ago

CentOS Linux release 7.4 pg_bulkload 3.1.14

this error occurred with postgresql-9.6.6 and postgresql-10.1

No error occurred with postgresql-9.5.10

egalot commented 6 years ago

Solved it by increasing the "maintenance_work_mem" paramter to 128MB in the "postgresql.conf" configuration file. Server restart required. Verify the adjustment was made by running the query: "show maintenance_work_mem"