springml / spark-sftp

Spark connector for SFTP
Apache License 2.0
100 stars 98 forks source link

Add support for read/write xml files #36 #37

Closed bagopalan closed 5 years ago

bagopalan commented 5 years ago

Overview of the PR:

Presently we support txt, avro, parquet, csv, and json. By this PR, we will have support for xml files too. https://github.com/databricks/spark-xml => Has support for Spark XML connectivity.

We have used this as a dependency to spark sftp so that we can read/write XML files from SFTP servers.

How the code was tested:

The code was tested using following statements.

val df = spark.read. format("com.springml.spark.sftp"). option("host", "SFTP-SERVER"). option("username", "SFTP-USER"). option("password", "****"). option("fileType", "xml"). option("rowTag", "YEAR").load("myxml.xml")

 df.write.format("com.springml.spark.sftp").
 option("host", "SFTP-HOSTr").
 option("username", "SFTP-USER").
 option("password", "****").
 option("fileType", "xml").
 option("rootTag", "YTD").
 option("rowTag", "YEAR").save("myxmlOut.xml.gz")

Out of scope:

Presently we support basic read / write for XML files. We mainly used the rowTag and rootTag params. This is enough for basic read write. We can enhance it in future with more parameters from spark XML.

samuel-pt commented 5 years ago

Merged the changes. Thanks for creating this well documented, neatly coded PR Will release the library to maven repository and spark-packages.org in next week