mla / pg_sample

PostgreSQL utility for creating a small, sample database from a larger one
308 stars 49 forks source link

DBD::Pg::db do failed: ERROR: #5

Open lpossamai opened 8 years ago

lpossamai commented 8 years ago

I'm using PostgreSQL 9.2.15-1PGDG.rhel6 on a Centos 6.7 64 Bits.

When executing the command: ./pg_sample dbname --data-only --file=test.sql --limit="ja_admins = 1000"

I got the following error:

Creating table "_pg_sample"."pg_catalog_pg_statistic" DBD::Pg::db do failed: ERROR:  column "stavalues1" has pseudo-type "anyarray" at ./pg_sample line 296
    main::__ANON__('DBD::Pg::db do failed: ERROR:  column "stavalues1" has pseudo...', 'DBI::db=HASH(0xfdec08)', undef) called at ./pg_sample line 538
Dropping sample schema _pg_sample
Done.

Using --trace option:

)= undef at pg_sample line 538
    -> HandleError on DBI::db=HASH(0x289b660) via CODE(0x2858218) (undef)
DBD::Pg::db do failed: ERROR:  column "stavalues1" has pseudo-type "anyarray" at ./pg_sample line 296
    main::__ANON__('DBD::Pg::db do failed: ERROR:  column "stavalues1" has pseudo...', 'DBI::db=HASH(0x289b738)', undef) called at ./pg_sample line 538
       ERROR: 7 'ERROR:  column "stavalues1" has pseudo-type "anyarray"' (err#0)
    <- DESTROY(DBI::st=HASH(0x289bb70))= undef at Carp.pm line 45
Dropping sample schema _pg_sample
    !! The ERROR '7' was CLEARED by call to do method
    <- do('DROP SCHEMA _pg_sample CASCADE')= '0E0' at pg_sample line 713
Done.
!   <- DESTROY(DBI::db=HASH(0x289b738))= undef at pg_sample line 45 during global destruction

What can I do to solve this issue?

mla commented 8 years ago

Hi! Can you give me the part of your schema that involved the pseudo-type? I've never used pseudo-types before, so not sure what's happening.

If you have a dump of the schema, you can see where the stavalues1 is referenced?

lpossamai commented 8 years ago

Hi! Yep.. I can give it to you if I could find it....

I've got the schema but was unable to find it...

the error says:

Creating table "_pg_sample"."pg_catalog_pg_shseclabel" 0
Creating table "_pg_sample"."pg_catalog_pg_statistic" DBD::Pg::db do failed: ERROR:  column "stavalues1" has pseudo-type "anyarray" at ./pg_sample line 296
    main::__ANON__('DBD::Pg::db do failed: ERROR:  column "stavalues1" has pseudo...', 'DBI::db=HASH(0x24b77c8)', undef) called at ./pg_sample line 538
Dropping sample schema _pg_sample
Done.

Couldn't find anything related to "stavalues1" or "anyarray"

Do you have any tips?

lpossamai commented 8 years ago

I see it trhough DBVISUALIZER:

Table pg_catalog.pg_statistic:

CREATE TABLE
    pg_statistic
    (
        starelid "OID" NOT NULL,
        staattnum SMALLINT NOT NULL,
        stainherit BOOLEAN NOT NULL,
        stanullfrac REAL NOT NULL,
        stawidth INTEGER NOT NULL,
        stadistinct REAL NOT NULL,
        stakind1 SMALLINT NOT NULL,
        stakind2 SMALLINT NOT NULL,
        stakind3 SMALLINT NOT NULL,
        stakind4 SMALLINT NOT NULL,
        stakind5 SMALLINT NOT NULL,
        staop1 "OID" NOT NULL,
        staop2 "OID" NOT NULL,
        staop3 "OID" NOT NULL,
        staop4 "OID" NOT NULL,
        staop5 "OID" NOT NULL,
        stanumbers1 REAL[],
        stanumbers2 REAL[],
        stanumbers3 REAL[],
        stanumbers4 REAL[],
        stanumbers5 REAL[],
        stavalues1 "ANYARRAY",
        stavalues2 "ANYARRAY",
        stavalues3 "ANYARRAY",
        stavalues4 "ANYARRAY",
        stavalues5 "ANYARRAY"
    );
mla commented 8 years ago

I don't know why it's trying to sample pg_catalog.* stuff. Try this patch. Make sure you use a test db. This is completely untested.

--- pg_sample.orig  2016-03-03 00:21:46.932012688 -0800
+++ pg_sample   2016-03-03 00:22:29.460012847 -0800
@@ -504,6 +504,8 @@
   my $sname = $row->{pg_schema} || unquote_identifier($row->{TABLE_SCHEM})
     or die "no pg_schema or TABLE_SCHEM value?!";

+  next if $sname eq 'pg_catalog';
+
   my $tname = $row->{pg_table} || unquote_identifier($row->{TABLE_NAME})
     or die "no pg_table or TABLE_NAME value?!";
lpossamai commented 8 years ago

@mla thanks.. The patch worked. _But _I got another error now =\

Error: Copying "dm"."invoices" ("invoice_id") rows referenced from "_pg_sample"."dm_bill_items" ("invoice_id")... DBD::Pg::db do failed: ERROR: could not identify an equality operator for type "json"

Table dm.invoices:

CREATE TABLE
    invoices
    (
        invoice_id BIGINT DEFAULT "nextval"('"dm"."invoices_invoice_id_seq"'::"regclass") NOT NULL,
        format_version CHARACTER VARYING(10) NOT NULL,
        ts_issue TIMESTAMP(6) WITH TIME ZONE NOT NULL,
        ts_ack TIMESTAMP(6) WITH TIME ZONE,
        customer_id BIGINT NOT NULL,
        code CHARACTER VARYING(32),
        tag CHARACTER VARYING(256),
        account_data "JSON" NOT NULL,
        customer_data "JSON" NOT NULL,
        invoice_data "JSON" NOT NULL,
        invoice_items_data "JSON"[],
        CONSTRAINT pk_invoices PRIMARY KEY (invoice_id),
        CONSTRAINT fk_customer_must_exist FOREIGN KEY (customer_id) REFERENCES
        geoop_uat_old.public.ja_customers (id),
        CONSTRAINT cc_ack_must_be_after_issue CHECK (COALESCE("ts_ack", "ts_issue") >= "ts_issue"),
        CONSTRAINT cc_ack_and_code_populated_together CHECK (("ts_ack" IS NULL) = ("code" IS NULL))
    );
mla commented 8 years ago

Must have an issue with json columns. I'll have to try and reproduce it. Late now, sorry, but will try to investigate tomorrow.

lpossamai commented 8 years ago

@mla you've helped a lot! It would be great if you can have a look on that issue for me (When u have time of course)

Thank you!

lpossamai commented 8 years ago

@mla I was also looking for a --exclude-table-data= option.. is there one using pg_sample? Cheers

rjuju commented 8 years ago

There's no equality operator for json. The usual trick is to cast the json fields to jsonb if available, or text:

# select '{}'::json = '{}'::json;
ERROR:  operator does not exist: json = json
LINE 1: select '{}'::json = '{}'::json;

# select '{}'::json::text = '{}'::json::text;
 ?column? 
══════════
 t
(1 row)

You can unfortunately have some wrong result with json data casted to text, depending on how they where inserted:

# select '{"b": 1, "a": 2}'::json::text = '{"a": 2, "b":1}'::json::text;
 ?column? 
══════════
 f
(1 row)

# select '{"b": 1, "a": 2}'::jsonb = '{"a": 2, "b":1}'::jsonb;
 ?column? 
══════════
 t
(1 row)

Hope that helps.

mla commented 8 years ago

@mla I was also looking for a --exclude-table-data= option.. is there one using pg_sample?

What would that do? Sorry, not following.

mla commented 8 years ago

Looks like the JSON issue is related to the "SELECT DISTINCT" around line 620. It can't do the DISTINCT because it doesn't understand the equality. @lpossamai, try just removing that DISTINCT clause.

@@ -615,14 +617,14 @@
     # satisfy the fk table, except those already present.
     my $query = qq{
       INSERT INTO $target_sample_table
-           SELECT DISTINCT t1.*
+           SELECT t1.*
              FROM $target_table t1
                   JOIN $sample_fk_table f1 ON ($join1)
                   LEFT JOIN $target_sample_table s1 ON ($join2)
             WHERE $where
     };
lpossamai commented 8 years ago

@mla yep.. removing "DISTINCT" worked... Thanks for that!

about my other question:

on pg_dump docs:

--exclude-table=table Do not dump any tables matching the table pattern. The pattern is interpreted according to the same rules as for -t. -T can be given more than once to exclude tables matching any of several patterns.

source: http://www.postgresql.org/docs/9.3/static/app-pgdump.html

Is there any option on pg_sample that does the same thing? Couldn't find...

lpossamai commented 8 years ago

well.. put the -limit="junk.* = 0" option.. it's fine. Thanks for your help @mla Can close this issue

lpossamai commented 8 years ago

@mla

The dump has been successfully created.

Command: ./pg_sample dbname --data-only --file=all_data.sql --limit="public.* = 1000" --limit="dm.* = 1000" --limit="integrations.* = *" --limit="gorfs.* = 100" --limit="junk.* = 0" --verbose

I've just got the error below:

psql:all_data_second.sql:3404263: ERROR:  duplicate key value violates unique constraint "pk_billables"
DETAIL:  Key ("billable_id")=(17) already exists.
CONTEXT:  COPY billables, line 513
psql:all_data_second.sql:10586440: ERROR:  duplicate key value violates unique constraint "pk_bills"
DETAIL:  Key ("bill_id")=(764047) already exists.
CONTEXT:  COPY bills, line 525
psql:all_data_second.sql:10938107: ERROR:  duplicate key value violates unique constraint "pk_invoices"
DETAIL:  Key ("invoice_id")=(340990) already exists.
CONTEXT:  COPY invoices, line 239084
psql:all_data_second.sql:12336660: ERROR:  duplicate key value violates unique constraint "pk_tax_aggregates"
DETAIL:  Key ("tax_aggregate_id")=(634898) already exists.
CONTEXT:  COPY tax_aggregates, line 595
psql:all_data_second.sql:26545982: ERROR:  duplicate key value violates unique constraint "ja_clients_pkey"
DETAIL:  Key ("id")=(458) already exists.
CONTEXT:  COPY ja_clients, line 550
psql:all_data_second.sql:28810421: ERROR:  duplicate key value violates unique constraint "ja_customers_pkey"
DETAIL:  Key ("id")=(338401) already exists.
CONTEXT:  COPY ja_customers, line 505
psql:all_data_second.sql:28818328: ERROR:  duplicate key value violates unique constraint "ja_mobiusers_pkey"
DETAIL:  Key ("id")=(683341) already exists.
CONTEXT:  COPY ja_mobiusers, line 520
psql:all_data_second.sql:28821661: ERROR:  duplicate key value violates unique constraint "pk_role"
DETAIL:  Key ("id")=(1731) already exists.
CONTEXT:  COPY ja_role, line 514
psql:all_data_second.sql:28828033: ERROR:  duplicate key value violates unique constraint "spatial_ref_sys_pkey"
DETAIL:  Key ("srid")=(3819) already exists.
CONTEXT:  COPY spatial_ref_sys, line 1

So when I do the import - I don't get the data from that tables.. which is bad.

Any idea why? Cheers

mla commented 8 years ago

I bet it was the fact that we removed the DISTINCT. I'll investigate how to distinct w JSON columns.

On Thursday, March 3, 2016, lpossamai notifications@github.com wrote:

@mla https://github.com/mla

The dump has been successfully created.

Command: ./pgsample dbname --exclude-table-data='junk.' --exclude-table-data='gorfs._' --data-only --file=public_dmintegrations.sql --limit="public.* = 1000" --verbose

I've just got the error below:

psql:all_data_second.sql:3404263: ERROR: duplicate key value violates unique constraint "pk_billables" DETAIL: Key ("billable_id")=(17) already exists. CONTEXT: COPY billables, line 513 psql:all_data_second.sql:10586440: ERROR: duplicate key value violates unique constraint "pk_bills" DETAIL: Key ("bill_id")=(764047) already exists. CONTEXT: COPY bills, line 525 psql:all_data_second.sql:10938107: ERROR: duplicate key value violates unique constraint "pk_invoices" DETAIL: Key ("invoice_id")=(340990) already exists. CONTEXT: COPY invoices, line 239084 psql:all_data_second.sql:12336660: ERROR: duplicate key value violates unique constraint "pk_tax_aggregates" DETAIL: Key ("tax_aggregate_id")=(634898) already exists. CONTEXT: COPY tax_aggregates, line 595 psql:all_data_second.sql:26545982: ERROR: duplicate key value violates unique constraint "ja_clients_pkey" DETAIL: Key ("id")=(458) already exists. CONTEXT: COPY ja_clients, line 550 psql:all_data_second.sql:28810421: ERROR: duplicate key value violates unique constraint "ja_customers_pkey" DETAIL: Key ("id")=(338401) already exists. CONTEXT: COPY ja_customers, line 505 psql:all_data_second.sql:28818328: ERROR: duplicate key value violates unique constraint "ja_mobiusers_pkey" DETAIL: Key ("id")=(683341) already exists. CONTEXT: COPY ja_mobiusers, line 520 psql:all_data_second.sql:28821661: ERROR: duplicate key value violates unique constraint "pk_role" DETAIL: Key ("id")=(1731) already exists. CONTEXT: COPY ja_role, line 514 psql:all_data_second.sql:28828033: ERROR: duplicate key value violates unique constraint "spatial_ref_sys_pkey" DETAIL: Key ("srid")=(3819) already exists. CONTEXT: COPY spatial_ref_sys, line 1

So when I do the import - I don't get the data from that tables.. which is bad.

Any idea why? Cheers

— Reply to this email directly or view it on GitHub https://github.com/mla/pg_sample/issues/5#issuecomment-192073953.

mla commented 8 years ago

I think we need to convert the DISTINCT to be DISTINCT ON (). I'll try to work on it tomorrow. Thanks for working through this.

lpossamai commented 8 years ago

That's great @mla - Thanks for your time and help...

mla commented 8 years ago

@lpossamai give this patch a try, please:

@@ -615,7 +617,8 @@ while ($num_rows) {
     # satisfy the fk table, except those already present.
     my $query = qq{
       INSERT INTO $target_sample_table
-           SELECT DISTINCT t1.*
+           SELECT DISTINCT ON ($target_cols)
+                  t1.*
              FROM $target_table t1
                   JOIN $sample_fk_table f1 ON ($join1)
                   LEFT JOIN $target_sample_table s1 ON ($join2)
lpossamai commented 8 years ago

@mla - Yep. That patch worked. No more errors.

I'm just trying to understand something:

I've used this command: ./pg_sample dbtest --file=all_data_3.sql --limit="public.* = 500" --limit="dm.* = 500" --limit="integrations.* = *" --limit="gorfs.* = 50" --limit="junk.* = 0" --verbose

This means that I'm trying to get 500 rows of all tables in "public" and "dm" schema, 50 rows in the "gorfs" schema and 0 rows on "junk" schema (right?).

But, after exporting and importing it to a new DB, when I do like:

select count(*) from public.ja_clients;

I get: 
count 
-------
  6249

and when I do the same command to another table into public schema:

select count(*) from public.ja_customers;
count 
-------
     0

Why does that happen? I should have 500 each.. shouldn't I?

The same happens on others schemas:

select count(*) from dm.bills;
  count  
---------
 1191268
(1 row)

Thanks for your prompt reply and help!

UPDATE: - Related to ja_customers:

I've just seen an error here: ( I got those errors when importing all the exported data - psql dbname -f filefromPG_SAMPLE.sql )

psql:all_data_schema.sql:21171292: ERROR:  value too long for type character varying(255)
CONTEXT:  COPY ja_customers, line 428054, column company: "NEW BLINDS: Supply & Delivery (46mm Wooden blinds x 4) Colour: TBC *Blinds Online–Home Brand* *..."
psql:all_data_schema.sql:21188979: ERROR:  duplicate key value violates unique constraint "spatial_ref_sys_pkey"
DETAIL:  Key ("srid")=(3819) already exists.
CONTEXT:  COPY spatial_ref_sys, line 1

table ja_customers:

CREATE TABLE
    ja_customers
    (
        id BIGINT DEFAULT "nextval"('"ja_customers_id_seq"'::"regclass") NOT NULL,
        clientid BIGINT DEFAULT 0 NOT NULL,
        name_first CHARACTER VARYING(80) DEFAULT ''::CHARACTER VARYING,
        name_last CHARACTER VARYING(80) DEFAULT ''::CHARACTER VARYING,
        company CHARACTER VARYING(255) DEFAULT ''::CHARACTER VARYING,
        businesstype CHARACTER VARYING(80) DEFAULT ''::CHARACTER VARYING,
        position CHARACTER VARYING(80) DEFAULT ''::CHARACTER VARYING,
        email CHARACTER VARYING(50) DEFAULT NULL::CHARACTER VARYING,
        address CHARACTER VARYING(250),
        suburb CHARACTER VARYING(250),
        city CHARACTER VARYING(80),
        phone CHARACTER VARYING(50) DEFAULT NULL::CHARACTER VARYING,
        mobile CHARACTER VARYING(40) DEFAULT NULL::CHARACTER VARYING,
        fax CHARACTER VARYING(40) DEFAULT NULL::CHARACTER VARYING,
        mailing_address CHARACTER VARYING(250),
        mailing_suburb CHARACTER VARYING(250),
        mailing_city CHARACTER VARYING(80),
        regdate BIGINT DEFAULT 0 NOT NULL,
        notes "TEXT" DEFAULT ''::"text",
        gps_lat NUMERIC(15,10),
        gps_long NUMERIC(15,10),
        hourlyrate NUMERIC(6,2) DEFAULT 0 NOT NULL,
        onaccount SMALLINT DEFAULT 0 NOT NULL,
        calc_mobile CHARACTER VARYING(20),
        country_code NUMERIC(10,1),
        client_code CHARACTER VARYING(50),
        geocode_accuracy SMALLINT DEFAULT 0,
        crm_accountid CHARACTER VARYING(20),
        crm_update BOOLEAN,
        deleted BOOLEAN DEFAULT false NOT NULL,
        address_bldg CHARACTER VARYING(150),
        schedule_key CHARACTER VARYING(255),
        schedule_key_timestamp INTEGER,
        login CHARACTER VARYING(30) DEFAULT ''::CHARACTER VARYING NOT NULL,
        pass CHARACTER VARYING(50) DEFAULT ''::CHARACTER VARYING NOT NULL,
        default_bill_customer BIGINT,
        url CHARACTER VARYING(64),
        postcode CHARACTER VARYING(20),
        c_invoice_count INTEGER,
        mailing_postcode CHARACTER VARYING(20),
        created_date TIMESTAMP(6) WITHOUT TIME ZONE DEFAULT "statement_timestamp"(),
        modified_date TIMESTAMP(6) WITHOUT TIME ZONE DEFAULT "statement_timestamp"(),
        created_userid INTEGER,
        modified_userid INTEGER,
        stpr CHARACTER VARYING(80),
        mailing_stpr CHARACTER VARYING(80),
        is_demo BOOLEAN DEFAULT false NOT NULL,
        tax_aggregate_id_cost BIGINT,
        tax_aggregate_id_price BIGINT,
        job_share_mode SMALLINT DEFAULT 0 NOT NULL,
        PRIMARY KEY (id),
        CONSTRAINT fk_tax_aggregate_cost_must_exist FOREIGN KEY (tax_aggregate_id_cost) REFERENCES
        dbname_uat.dm.tax_aggregates (tax_aggregate_id),
        CONSTRAINT fk_tax_aggregate_price_must_exist FOREIGN KEY (tax_aggregate_id_price)
        REFERENCES dbname_uat.dm.tax_aggregates (tax_aggregate_id),
        CONSTRAINT fk_account_must_exist FOREIGN KEY (clientid) REFERENCES ja_clients (id),
        CONSTRAINT fk_default_bill_customer_must_exist FOREIGN KEY (default_bill_customer)
        REFERENCES ja_customers (id),
        CONSTRAINT cc_gps_lat_and_long_must_come_together CHECK (("gps_long" IS NULL) = ("gps_lat"
        IS NULL))
    );

Some other information: -bash-4.1$ ./pg_sample dbname_uat --file=all_dataschema.sql --limit="public.* = 500" --limit="dm.* = 500" --limit="integrations.* = " --limit="gorfs._ = 50" --limit="junk.* = 0" --verbose

Server encoding is UTF8
Client encoding is UTF8
Exporting schema
Creating sample schema _pg_sample
[limit] public.* = 500
[limit] dm.* = 500
[limit] integrations.* = *
[limit] gorfs.* = 50
[limit] junk.* = 0
[limit] .* = 100

[...]

Creating table "_pg_sample"."public_ja_customers" 500

[...]

Copying "public"."ja_customers" ("id") rows referenced from "_pg_sample"."public_ja_customers" ("default_bill_customer")... 119 rows
Copying "public"."ja_customers" ("id") rows referenced from "_pg_sample"."dm_billables_links" ("customer_id")... 183 rows
Copying "public"."ja_customers" ("id") rows referenced from "_pg_sample"."integrations_customers" ("customer_id")... 2007308 rows
Copying "public"."ja_customers" ("id") rows referenced from "_pg_sample"."dm_invoices" ("customer_id")... 18133 rows
Copying "public"."ja_customers" ("id") rows referenced from "_pg_sample"."dm_bills" ("customer_id")... 55745 rows
Copying "public"."ja_customers" ("id") rows referenced from "_pg_sample"."public_ja_jobs" ("customerid")... 304 rows
Copying "public"."ja_customers" ("id") rows referenced from "_pg_sample"."public_ja_jobs" ("bill_customer")... 11 rows

[...]
mla commented 8 years ago

@lpossamai, you should never get an empty table, unless you request 0 rows.

Getting more rows than requested is definitely not uncommon though. That will be dictated by the foreign key constraints. e.g., you may request only 500 rows from table A, but if you include 10k rows from table B, and each row from table B has a FK to table A, you'll end up with however many rows from table A were needed to satisfy table B's FKs.

Regarding the "psql:all_data_schema.sql:21171292: ERROR: value too long for type character varying(255)" error, how did you create the new DB that you're loading into? I see that the existing DB is using UTF8. Is the new DB using UTF8 too? Looking at the error message, that string looks messed up. Like it's treating the UTF8 value as Latin1. And that could cause that error... the multi-byte UTF8 characters are being interpreted as single byte characters, causing the string to end up longer and exceeding the column width.

mla commented 8 years ago

@lpossamai try this patch, please:

--- a/pg_sample
+++ b/pg_sample
@@ -192,6 +192,8 @@ use Getopt::Long qw/ GetOptions :config no_ignore_case /;
 use DBI;
 use DBD::Pg 2.0.0;

+binmode STDOUT, ':utf8';
+
 $SIG{TERM} = $SIG{INT} = $SIG{QUIT} = $SIG{HUP} = sub {
   my $signal = shift;
   die "Received signal '$signal'; cleaning up and terminating.\n";

It's not a real fix, but I'm wondering if we need to explicitly set STDOUT to use UTF8 encoding. Try running it again and see if you still get that "value too long" error.

mla commented 8 years ago

Hrm. I see we already set the encoding of STDOUT, so I doubt that change will do anything. What value of LANG do you have in your shell?

lpossamai commented 8 years ago

Hi @mla

My new DB is UTF-8 as well...

Shell = LANG=en_US.UTF-8

I'm currently testing the changes you've told me before about the STDOUT...

It's copying now.. once it's done let u know.. thanks!

mla commented 8 years ago

When you view that row containing "NEW BLINDS: Supply & Delivery..." in psql, what does it look like? Do you see those messed up characters or something else?

lpossamai commented 8 years ago

I don't remember if I saw that during the pg_sample or the importing (using pgsql command)

So.. I'm doing it again now and will pay attention to see if I get any errors.. shouldn't take too long..

lpossamai commented 8 years ago

@mla Yep.. still getting the errors when importing using psql:

psql:all_data_3.sql:21171222: ERROR:  value too long for type character varying(255)
CONTEXT:  COPY ja_customers, line 428054, column company: "NEW BLINDS: Supply & Delivery (46mm Wooden blinds x 4) Colour: TBC *Blinds Online–Home Brand* *..."
psql:all_data_3.sql:21188908: ERROR:  duplicate key value violates unique constraint "spatial_ref_sys_pkey"
DETAIL:  Key ("srid")=(3819) already exists.
CONTEXT:  COPY spatial_ref_sys, line 1
mla commented 8 years ago

k, thanks. So in the source db, what does that row look like? Does it show up as "Blinds Online�Home Brand* *" there too?

lpossamai commented 8 years ago

If I could find it.. But I can't hahahaha =\

mla commented 8 years ago

SELECT company FROM ja_customers WHERE company LIKE 'NEW BLINDS%'

?

lpossamai commented 8 years ago

yep... thanks!

NEW BLINDS: Supply & Delivery (46mm Wooden blinds x 4) Colour: TBC Blinds Online–Home Brand * Doesn't includes check measure & professional install visits. * Please allow 15–18 working days for blind manufacture and del. * Freight included * 3 YEAR WA

mla commented 8 years ago

ok, good. So it seems like there's some encoding issue going on.

We're sure the encoding of the source DB is UTF8? "psql -l" shows what for the source db?

mla commented 8 years ago

And that string is exactly 255 chars. So the theory about the multibyte being interpreted as individual chars still makes sense then.

lpossamai commented 8 years ago

Source DB:

source_uat | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =Tc/postgres + | | | | | postgres=CTc/postgres

Dst DB: test_lucas2 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8

mla commented 8 years ago

And if you grep the pg_sample output file for that string, you see the screwed up version, yes?

grep "NEW BLINDS" all_data_3.sql

lpossamai commented 8 years ago

yep....

2221550 8948    NB–Supply & Delivery   NEW BLINDS: Supply & Delivery\n(25mm Venetian blinds x ?)\n(50mm Venetian blinds x ?)\n(??mm Vertical blinds x ?)\nDay/Night Dual Blind Systems x ?\n(Blockout Rollerblinds x ?)\n(Thermal Blockout Rollerblinds x ?)\n(5% Sunscreen blinds x ?)\n(46mm Wooden blinds x ?)\n(46mm Cedar blinds x ?)\n(50mm Cedar blind  \NUnspecified Units 0.0000  \N  0.0000  t   151029  2015-07-11 14:00:42.914421+00   2015-07-11 14:00:42.914421+00   \N  f
2221570 8948    NB–Supply & Install    NEW BLINDS: Supply & Install\n(25mm Venetian blinds x ?)\n(50mm Venetian blinds x ?)\n(??mm Vertical blinds x ?)\nDay/Night Dual Blind Systems x ?\n(Blockout Roller blinds x ?)\n(Thermal Blockout Roller blinds x ?)\n(5% Sunscreen blinds x ?)\n(46mm Wooden blinds x ?)\n(46mm Cedar blinds x ?)\n(50mm Cedar blin  \NUnspecified Units 0.0000  \N  0.0000  t   151027  2015-07-11 14:00:42.914421+00   2015-07-11 14:00:42.914421+00   \N  f
2221572 8948    SUP/DEL NEW BLINDS >\nFor supply and delivery of...\n\nMeasurements by:\n(width x drop)(mm)(control side)\n\n\n(25mm Venetian blinds x ?)\n(50mm Venetian blinds x ?)\n(??mm Vertical blinds x ?)\n(Blockout Roller blinds x ?)\n(5% Sunscreen blinds x ?)\n(46mm Cedar blinds x ?)\n(50mm Cedar blinds x ?)\n(50mm Wood Vision bl  \N  Unspecified Units   0.0000  \N  0.0000  t   151027  2015-07-11 14:00:42.914421+00   2015-07-11 14:00:42.914421+00   \N  f
mla commented 8 years ago

What perl is being used?

/usr/bin/env perl -v

lpossamai commented 8 years ago

This is perl, v5.10.1 (*) built for x86_64-linux-thread-multi

mla commented 8 years ago

You might try a more recent perl. I'm using 5.18 for this testing, which is itself fairly old now.

You can use the "plenv" tool to install local versions of perl. See: https://github.com/tokuhirom/plenv

mla commented 8 years ago

What version of DBD::Pg do you have installed?

perl -MDBD::Pg -le 'print DBD::Pg->VERSION'

lpossamai commented 8 years ago

ok so.. perl has been updated to 5.18

and DBD - 3.5.3

I'll do the process again now... let u know once it's done

mla commented 8 years ago

Crashing for tonight. I was reading that perl 5.10 was pretty buggy, so maybe there were some unicode issues with it.

lpossamai commented 8 years ago

@mla updating the Perl I haven't got the error anymore.. so.. PASS

I just got this error when importing the file:

psql:all_data_3.sql:21188918: ERROR:  duplicate key value violates unique constraint "spatial_ref_sys_pkey"
DETAIL:  Key ("srid")=(3819) already exists.
CONTEXT:  COPY spatial_ref_sys, line 1

Any idea?

mla commented 8 years ago

Excellent. Can you show me table definition for spatial_ref_sys including all constraints?

On Sunday, March 6, 2016, lpossamai notifications@github.com wrote:

@mla https://github.com/mla updating the Perl I haven't got the error anymore.. so.. PASS

I just got this error when importing the file:

psql:all_data_3.sql:21188918: ERROR: duplicate key value violates unique constraint "spatial_ref_sys_pkey" DETAIL: Key ("srid")=(3819) already exists. CONTEXT: COPY spatial_ref_sys, line 1

Any idea?

— Reply to this email directly or view it on GitHub https://github.com/mla/pg_sample/issues/5#issuecomment-192851371.

lpossamai commented 8 years ago

Sure

CREATE TABLE
    spatial_ref_sys
    (
        srid INTEGER NOT NULL,
        auth_name CHARACTER VARYING(256),
        auth_srid INTEGER,
        srtext CHARACTER VARYING(2048),
        proj4text CHARACTER VARYING(2048),
        PRIMARY KEY (srid),
        CONSTRAINT spatial_ref_sys_srid_check CHECK (("srid" > 0)
    AND (
            "srid" <= 998999))
    );
mla commented 8 years ago

How many tables have foreign keys to that table? Wondering if it's the DISTICT issue again.

On Sunday, March 6, 2016, lpossamai notifications@github.com wrote:

Sure

CREATE TABLE spatial_ref_sys ( srid INTEGER NOT NULL, auth_name CHARACTER VARYING(256), auth_srid INTEGER, srtext CHARACTER VARYING(2048), proj4text CHARACTER VARYING(2048), PRIMARY KEY (srid), CONSTRAINT spatial_ref_sys_srid_check CHECK (("srid" > 0) AND ( "srid" <= 998999)) );

— Reply to this email directly or view it on GitHub https://github.com/mla/pg_sample/issues/5#issuecomment-192858149.

lpossamai commented 8 years ago

Sorry @mla - Any idea how can I test that? I'm not a postgres fan...

mla commented 8 years ago

@lpossamai pull my latest version from the 1.10 branch, if you would: https://github.com/mla/pg_sample/tree/release/1.10

I added additional logging to --verbose mode to list all the foreign keys.

If you could, run it again with --verbose option, and upload the stderr output here (or email to me at maurice.aubrey@gmail.com).

lpossamai commented 8 years ago

@mla - yep.. doing it now. - Thanks for your help!

mla commented 8 years ago

Sure thing :) Thanks for your patience. We're finding a lot of interesting issues that haven't come up before.

Do you know if any of your tables use inheritance? That's an area I haven't tested before.

I have to run out for a few hours but will check in later tonight and see if we can figure out what's going on here. I think we're close.

lpossamai commented 8 years ago

Yeah! That's great @mla - You've been patient as well :P

Don't think we use inheritance - But there are lots of tables.. If u know somehow to see from all of them, I can test....

I've imported the file created by pg_sample and it works fine.... So to solve the last error we have time...

I have deleted the table description to keep the security