springml / spark-sftp

Spark connector for SFTP
Apache License 2.0
100 stars 98 forks source link

Fixes #21 #68

Open shaikmanu797 opened 5 years ago

shaikmanu797 commented 5 years ago

Reading the coalesce numPartitions value from custom SparkSession config key spark.sftp.coalesce.partitions, default numPartitions set to 1

package com.springml.spark.sftp

import org.apache.spark.sql.SparkSession

object Driver extends App {
  val spark: SparkSession = SparkSession.builder().master("local").getOrCreate()
  spark.conf.set(constants.coalescePartitionsConfKey, 4)

  val df = spark.read.
    format("com.springml.spark.sftp").
    option("host", "localhost").
    option("port", "2222").
    option("username", "foo").
    option("password", "pass").
    option("fileType", "csv").
    option("inferSchema", "true").
    load("/upload/airports.csv")
    .repartition(8)

  df.write.
    format("com.springml.spark.sftp").
    option("host", "localhost").
    option("port", "2222").
    option("username", "foo").
    option("password", "pass").
    option("fileType", "csv").
    option("delimiter", ";").
    save("/upload/")

  spark.close()
}

Test run log attached: test.log

shaikmanu797 commented 5 years ago

@samuel-pt, please review the PR and let me know if you think any additional changes should be made.

marcraminv commented 4 years ago

Ei @shaikmanu797 @samuel-pt Any forecast to have this feature merge to master? thank you 👍 !

shaikmanu797 commented 3 years ago

@marcraminv @fernandomora @vejeta @sunayansaikia @ezra-at-lumedic

Since this PR has not been reviewed for more than a year now and the contributors in this repository seems to be inactive for a long time.

I had to go with my own implementation of the API to fix the issue. Feel free to take a look at the package and any feedback is highly appreciated.

https://github.com/arcizon/spark-filetransfer