keboola / db-writer-redshift

Redshift database writer
MIT License
0 stars 0 forks source link

Columns Mismatch #11

Closed ondrejhlavacek closed 6 years ago

ondrejhlavacek commented 6 years ago

This table

https://connection.keboola.com/admin/projects/1505/storage#/tables/out.c-main.map_ga_data

gets exported to the writer

https://connection.keboola.com/admin/projects/1505/writers/keboola.wr-redshift-v2/215207512/table/out.c-main.map_ga_data

The temp table definition is

CREATE TABLE "map_ga_data_temp_5a6213e34ff0c" 
(
"map_data_id" INTEGER NULL ,
"returning_sessions" INTEGER NULL ,
"new_sessions" INTEGER NULL ,
"page_views" INTEGER NULL ,
"users" INTEGER NULL ,
"most_viewed_page" VARCHAR(255) NULL ,
"bounce_rate" REAL NULL ,
"average_session_duration" INTEGER NULL ,
"id" INTEGER NOT NULL 
);

but the exporting event says this order of columns

image

and the exported file has the columns ordered by the api call so it throws an error

{
    "query": "COPY \"map_ga_data_temp_5a6213e34ff0c\" FROM 's3://*' CREDENTIALS '*' REGION AS 'us-east-1' DELIMITER ',' CSV QUOTE '\"' NULL AS 'NULL' ACCEPTANYDATE TRUNCATECOLUMNS MANIFEST GZIP;",
    "redshift_errors": [
        {
            "userid": 1,
            "slice": 0,
            "tbl": 1,
            "starttime": "2018-01-19 15:51:28.253312",
            "session": 1,
            "query": 1,
            "filename": "*",
            "line_number": 1,
            "colname": "bounce_rate ",
            "type": "float ",
            "col_length": "0",
            "position": ...,
            "raw_line": "\"97\",\"4\",\"4\",\"2\",\"2\",\"180\",\"www***\",\"14\",\"1800\" ",
            "raw_field_value": "www.***",
            "err_code": 1207,
            "err_reason": "Invalid digit, Value 'w', Pos 0, Type: Float ",
        }
    ]
}

so there is an obvious columns mismatch (bounce_rate is fed content of most_viewed_page)

pivnicek commented 6 years ago

@ondrejhlavacek do you have a link to that export event? all the ones I've checked seem to have the correct column order.

MiroCillik commented 6 years ago

I got it https://connection.keboola.com/admin/projects/1505/jobs/348322967?q=%2Bparams.component%3Akeboola.wr-redshift-v2%20%2Bparams.config%3A215207512

MiroCillik commented 6 years ago

@ondrejhlavacek tu mam job, ktory skoncil errorm kovli mismatchi v data typoch, ale pritom ten exportny event ma rovnake poradie stlpcov ako temp tabulka:

columns
0: map_data_id
1: returning_sessions
2: new_sessions
3: page_views
4: users
5: most_viewed_page
6: bounce_rate
7: average_session_duration
8: id
'CREATE TABLE "map_ga_data_temp_5a646db6d6c98" (
    "map_data_id" VARCHAR(255) NOT NULL ,
    "returning_sessions" VARCHAR(255) NOT NULL ,
    "new_sessions" VARCHAR(255) NOT NULL ,
    "page_views" VARCHAR(255) NOT NULL ,
    "users" VARCHAR(255) NOT NULL ,
    "most_viewed_page" VARCHAR(255) NOT NULL ,
    "bounce_rate" VARCHAR(255) NOT NULL ,
    "average_session_duration" VARCHAR(255) NOT NULL ,
    "id" VARCHAR(255) NOT NULL 
);'

https://connection.keboola.com/admin/projects/1505/jobs/348322967?q=&eventId=694416439

MiroCillik commented 6 years ago

Ok, uz to vidim tu: https://connection.keboola.com/admin/projects/1505/jobs/347888737?q=%2Bparams.component%3Akeboola.wr-redshift-v2%20%2Bparams.config%3A215207512

MiroCillik commented 6 years ago

podarilo sa mi tu chybu nasimulovat https://connection.keboola.com/admin/projects/219/jobs/349904200

Spravil som to, ze som zo Storage zmazal prvy stlpec a potom ho rucne pridal (ako posledny). Tym vznikla diskrepancia medzi poradim stlpcov v Storage a v Input mappingu writeru:

input mapping:

    "returning_sessions",
    "new_sessions",
    "page_views",
    "users",
    "most_viewed_page",
    "bounce_rate",
    "average_session_duration",
    "id",
    "map_data_id"
]

storage:

returning_sessions
new_sessions    
page_views    
users    
most_viewed_page    
bounce_rate    
average_session_duration    
id

Co by znamenalo, ze bud sa stlpce nevyexportuju v poradi ako je definovane v IM alebo je nespravne poradie v manifeste, ktory prida docker runner. Respektive je mozne, ze v tom manifeste je poradie, ktore je aktualne v Storage (?)

Halama commented 6 years ago

@MiroCillik jo tak to bude varianta 2, docker runner dava do manifestu vsechny sloupce tabulky z jejich metadat https://github.com/keboola/input-mapping/blob/66ec27fb63d989f533a0b8c51e3e6577dd3ec05a/Reader/Reader.php#L332 To bude teda asi k diskuzi.

Halama commented 6 years ago

zitra jeste overim

Halama commented 6 years ago

varianta 1 tzn. ze by to spatne poslalo sapi staci zkontrolovat exportni event a prilozeny soubor ale tam je chyba dost nepravepodobna

Halama commented 6 years ago

Myslis si ze jinak spravne reseni je asi nepracovat s tim co je v manifestu ale vzit to z nastaveni storage.input

Halama commented 6 years ago

tady se to jeste radi podle toho co je v manifestu a zda se ze to nebude jediny writer tim postizeny https://github.com/keboola/db-writer-redshift/blob/c365efbd943a938dd9954244cc18bcf25079b631/src/Keboola/DbWriter/Redshift/Application.php#L28

Halama commented 6 years ago

A pokud se nemění ve zdrojové tabulce pořadí slouců během toho co konfigurace existuje tak se to neprojeví. Prottože ten wirter to ofiltruje podle toho co má v configu. Projeví se to v moment kdy se např. odebere sloupec zdrojové tabulky a pak zas přídá, tím se objeví na konci a bude tak i v tom manifestu. Vyexportovaný bude ale v takovém pořadí jako je v konfiguraci.

Halama commented 6 years ago

A tady je to nasimulovaný https://connection.keboola.com/admin/projects/4088/writers/keboola.wr-db-snowflake/349938950

MiroCillik commented 6 years ago

Jo myslel som, ze to bude tym manifestom, stacilo by mozno vyhodit tu reorder metodu?

On 26 Jan 2018 20:09, "Martin Halamíček" notifications@github.com wrote:

A tady je to nasimulovaný https://connection.keboola. com/admin/projects/4088/writers/keboola.wr-db-snowflake/349938950

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/keboola/db-writer-redshift/issues/11#issuecomment-360876177, or mute the thread https://github.com/notifications/unsubscribe-auth/ABa0j0XcVR21rchv21oR8Z4sx0HuAa2gks5tOiLrgaJpZM4RlfaH .

Halama commented 6 years ago

jo myslim si ze jo. A naopak bych tam radsi pridal tu kontrolu ze sedi table config items s tim co je ve `storage.input.tables.columns!? https://github.com/keboola/db-writer-redshift/blob/master/src/Keboola/DbWriter/Redshift/Application.php#L28

Halama commented 6 years ago

Tak jsem prisel na to jak se jim zmenilo poradi sloupcu v te tabulce, dela to storage. Pokud má tabulka sloupce např. id,name a naimportuješ do ní name,id tak to změní to pořadí i v tabulce, což by teda dělat nemuselo.

ondrejhlavacek commented 6 years ago

@MiroCillik https://github.com/keboola/db-writer-redshift/issues/11#issuecomment-360726958 moc nerozumím?

MiroCillik commented 6 years ago

@ondrejhlavacek to neries :) @Halama vyhodim teda ten reordering a pridam kontrolu, ci sedi config s manifestom.

MiroCillik commented 6 years ago

Este by to chcelo opravit ten manifest si myslim, teda aby v tom manifeste boli stlpce v takom poradi, ako su vo vyexportovanom CSV

ondrejhlavacek commented 6 years ago

https://github.com/keboola/docker-bundle/issues/62#issuecomment-361212870

Halama commented 6 years ago

@MiroCillik kde to ma PR?

MiroCillik commented 6 years ago

@Halama tu https://github.com/keboola/db-writer-redshift/pull/12, som to mal v mastri, ale opravil som to ;) Mal som to este rozdelit do viacerych commitov..

Halama commented 6 years ago

to si teda mel :) Zitra to zkouknu

Halama commented 6 years ago

dalsi diskuze v #12

Halama commented 6 years ago

Jinak se to teda vyřešilo upravou docker runneru který do columns manifestu uvádí sloupce přesně tak jak byly vyexportovány.

V ticketu jsem odpověděl https://keboola.zendesk.com/agent/tickets/7840