springml / spark-sftp

Spark connector for SFTP
Apache License 2.0
100 stars 98 forks source link

CSV files have double quotes in case of empty field values. How to avoid that? #86

Open sukanya-pai opened 3 years ago

sukanya-pai commented 3 years ago

Hi, I am trying to send a file from one server to a mainframe server working under SFTP protocol. So I used this package. I am facing the issue that if the field values are blank or empty, I am getting double quotes in place of nothing. I tried using nullValue and emptyValue options with multiple values like "", null,"u/0000", none of them works. I still see double quotes on mainframe server.

dataFrame.coalesce(1).write.
        format("com.springml.spark.sftp").
        option("host", host).
        option("port", port).
        option("username", username).
        option("pem", privateKey).
        option("pemPassphrase", privateKeyPhrase).
        option("fileType", "csv").
        option("delimiter", DELIMITER).
        option("header","true").
        option("inferSchema", "true").
        option("nullValue", "").
        option("emptyValue","").
        save(filePath)

Required Output:

id | first_name | middle_name| last_name
1 | Sukanya| S | Pai
2| ABC ||XYZ

If the middle name is blank, then the value should be blank as shown above. but currently, I am getting the below double quotes in the file

id | first_name | middle_name| last_name
1 | Sukanya| S | Pai
2| ABC |""|XYZ
mrugankatdure commented 2 years ago

try using below,

.option("quote", "\u0000") .option("nullValue",null)

raj4j2ee commented 2 years ago

I am also having similar issue and need to get in place of double quote need empty incase there is no value. Please let me know if you got any fix

deepankumaresan commented 1 year ago

Using

.option("emptyValue", null) .option("nullValue", null) worked for me

Source : https://stackoverflow.com/questions/62819776/spark-csv-writer-outputs-double-quotes-for-empty-string