springml / spark-sftp

Spark connector for SFTP
Apache License 2.0
100 stars 98 forks source link

using spark sftp with databricks #70

Open jch231 opened 4 years ago

jch231 commented 4 years ago

I was wondering if you could help me with using spark-sftp with databricks. Firstly, I am struggling to import the library in databricks - I can only see (very few) examples in the documentation on loading in a dataframe, but nothing on how we import the library into the notebook itself. Secondly, is there a python API for spark-sftp, or is the functionality only available in Scala? (I develop using pyspark by can get past this by loading in the dataframe using scala and creating a temp view to access the dataframe with python). Thanks!

FurcyPin commented 4 years ago

Hello,

All you have to do is make sure the jar of this project is added to your spark environment. I don't use Databricks but I think you could start there: https://docs.databricks.com/libraries.html

As for the pyspark API, it is exactly the same. You can write

df = spark.read.format("com.springml.spark.sftp").options(...).load()

and it should work.

And if you see this error:

java.lang.ClassNotFoundException: com.springml.spark.sftp.DefaultSource

It means that you did not add the jar correctly and that your spark install can't find it.