springml / spark-sftp

Spark connector for SFTP
Apache License 2.0
100 stars 98 forks source link

Is it possible to add DataFrame`s data to existing csv file? #64

Open nurzhannogerbek opened 5 years ago

nurzhannogerbek commented 5 years ago

First of all, thank you for this wonderful library!

In remote SFTP server I have csv file with some data. Is it possible to add DataFrame's data to this existing file? In other words, previous data inside csv file should not be overwritten. I notice that the code below recreate the file.

val df: DataFrame = Seq(
    ("Alex", "2018-01-01 00:00:00", "2018-02-01 00:00:00", "OUT"),
    ("Bob", "2018-02-01 00:00:00", "2018-02-05 00:00:00", "IN"),
    ("Kate", "2018-02-01 00:00:00", "2018-02-05 00:00:00", "IN"),
    ("Alice", "2018-02-01 00:00:00", "2018-02-05 00:00:00", "OUT"),
).toDF("FIRST_NAME", "START_DATE", "END_DATE", "STATUS")

df.write.
    format("com.springml.spark.sftp").
    option("host", "XXXX").
    option("username", "XXXX").
    option("password", "****").
    option("fileType", "csv").
    option("delimiter", ";").
    save("/PATH/test.csv")
nurzhannogerbek commented 5 years ago

I also tried such code. Unfortunately, both variant did not help me. I will be grateful for any help.

Variant A:

import org.apache.spark.sql.SaveMode

df.write.
    mode(SaveMode.Append).
    format("com.springml.spark.sftp").
    ...

Variant B:

df.write.
    mode("append").
    format("com.springml.spark.sftp")
    ...
samuel-pt commented 5 years ago

@NogerbekNurzhan - This is a good feature requirement. Unfortunately we have limited time to work on it. We'll work on it when we get some time. Also if possible, add the code to support this feature and create a PR