lisad / phaser

The missing layer for complex data batch integration pipelines
MIT License
9 stars 1 forks source link

clevercsv puts "NULL" in columns that do not have values #82

Closed jeffkole closed 8 months ago

jeffkole commented 8 months ago

In the following csv file, the manager_id for the second row gets passed into the cast function of IntColumn as the string "NULL", which causes a conversion error.

employeeNumber,firstName,lastName,payType,paidPer,payRate,bonusAmount,Status,department,manager_id
1,Benjamin,Sisko,"salary","Year","188625","30000",Active,Marketing,4
2,Kira,Nerys,"salary","Year","118625","20000",Active,Finance
,None,Garak,"salary","Year", 100000,,Inactive,Finance,
4,Rasma,Son,"salary","Year",230000,24000,Active,Marketing,
5,Aldina,Sharrow,"salary","Year",140000,18000,Active,Finance,2
6,Viktor,Matic,"salary","Year",180000,25000,Active,Finance,2

This bug shows up on main when clevercsv is used for reading and Pandas is used for writing. And it only shows up when passing data between phases, because Pandas is asked to write out "NULL" for empty columns.