ydb-platform / ydb

YDB is an open source Distributed SQL Database that combines high availability and scalability with strong consistency and ACID transactions
https://ydb.tech
Apache License 2.0
3.87k stars 533 forks source link

Null's while import data from cli #6071

Closed SloNN closed 2 months ago

SloNN commented 3 months ago

Created a CSV-file as export from pg

date,boolean,uuid,bigint,timestamp without time zone,character varying,point,integer,double precision,numeric,json,smallint
2011-11-11,false,c91cd37b-f4c1-4dd4-83b1-a2c9db7ce411,138383838388389,2011-11-11 11:11:33.23,"as;djfa;skldf asdf as;dkfjasdf, asd,f asdf ","(12.34,56.78)",12,2342.234393993,12345.67,"{""a"":1}",3

Imported Data from that File with command

ydb -e grpcs://ydb-pg-5.ru-central1.internal:2135 -d /Root/database --ca-file ~/ydb-ansible-examples/TLS/CA/certs/ca.crt import file csv -p text pg_import_novb.csv --header

Table structure is

create table text (
`date` date,
`boolean` bool,
`uuid` uuid,
`bigint` int64, 
`timestamp_wo_tz` timestamp,
`character_varying` Utf8,
`character_varying__` Utf8,
--"point" point,
`integer` integer,
`double_precision` double,
`numeric` decimal(22,9),
`json` json,
`smallint` int16,
primary key(`bigint`)
)

After import process has ended I received the following Data in the YDB:

┌─────────────────┬─────────┬────────────────┬───────────┬──────────────┬──────────────────┬─────────┬─────────────┬────────────┬──────────┬─────────────────┬───────────┐
│ bigint          │ boolean │ character_vary │ character │ date         │ double_precision │ integer │ json        │ numeric    │ smallint │ timestamp_wo_tz │ uuid      │
│                 │         │ ing            │ _varying_ │              │                  │         │             │            │          │                 │           │
│                 │         │                │ _         │              │                  │         │             │            │          │                 │           │
├─────────────────┼─────────┼────────────────┼───────────┼──────────────┼──────────────────┼─────────┼─────────────┼────────────┼──────────┼─────────────────┼───────────┤
│ 138383838388389 │ false   │ null           │ null      │ "2011-11-11" │ null             │ 12      │ "{\"a\":1}" │ "12345.67" │ 3        │ null            │ "c91cd37b │
│                 │         │                │           │              │                  │         │             │            │          │                 │ -f4c1-4dd │
│                 │         │                │           │              │                  │         │             │            │          │                 │ 4-83b1-a2 │
│                 │         │                │           │              │                  │         │             │            │          │                 │ c9db7ce41 │
│                 │         │                │           │              │                  │         │             │            │          │                 │ 1"        │
└─────────────────┴─────────┴────────────────┴───────────┴──────────────┴──────────────────┴─────────┴─────────────┴────────────┴──────────┴─────────────────┴───────────┘

There are two issues:

  1. Unparsed fields were imported as NULL values without asking me. Say I have a "bad" value inside big CSV file, then I import it and have no chance to find broken fields, leading to incorrect results later
  2. Data needs to be imported without errors
pnv1 commented 2 months ago

Looks like the first row in csv file is built incorrectly.

ydb import sets null for columns that it can't find in csv and ignores columns in csv that are not present in table

SloNN commented 2 months ago

I'm sorry. The example was incorrect. Everything works fine