Open brynnborton opened 5 years ago
@brynnborton - Will add this fix. thanks
Hello guys. I am getting the same error using the latest version (1.1.5) on Databricks. Is it really fixed now ?
I've made a fix for this, what's the best way to push it?
@brynnborton @dirceusemighinieleflow I am still running into NPE even after using 1.1.5 on Databricks.. any suggestions to resolve this please?
@brynnborton @dirceusemighinieleflow I am still running into NPE even after using 1.1.5 on Databricks.. any suggestions to resolve this please?
Can you describe what have you done? What environment are you using, and which code are you using?
@dirceusemighini I am running this from databricks and trying to ftp a file. code below, also some forum says that hdfstemplocation should be used. help me understand how this connector is using hdfstemplocation and templocation. Can hdfstemplocation be dbfs:/ (databricks file system)?
%scala val df = spark.read.text("dbfs:/databricks-datasets/online_retail/data-001/data.csv") display(df) df.write. format("com.springml.spark.sftp"). option("host", "XXXXXX"). option("username", "XXXXX"). option("password", "XXXXXX"). option("fileType","csv"). option("tempLocation","dbfs:/databricks/test/"). save("XXXXXXXX")
Error below,
at scala.collection.mutable.ArrayOps$ofRef$.newBuilder$extension(ArrayOps.scala:190)
at scala.collection.mutable.ArrayOps$ofRef.newBuilder(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:246)
at scala.collection.TraversableLike$class.filter(TraversableLike.scala:259)
at scala.collection.mutable.ArrayOps$ofRef.filter(ArrayOps.scala:186)
at com.springml.spark.sftp.DefaultSource.copiedFile(DefaultSource.scala:275)
at com.springml.spark.sftp.DefaultSource.writeToTemp(DefaultSource.scala:264)
at com.springml.spark.sftp.DefaultSource.createRelation(DefaultSource.scala:130)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:72)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:88)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:146)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:134)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$5.apply(SparkPlan.scala:187)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:183)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:134)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:116)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:116)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:710)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:710)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withCustomExecutionEnv$1.apply(SQLExecution.scala:111)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:240)
at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:97)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:170)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:710)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:306)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:292)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:235)
at linef54d033b830e42cbbaacd889ae183bb139.$read$$iw$$iw$$iw$$iw$$iw$$iw.
My main problem, if I recall, is that, azure databricks, mess the temp file location.
val tempFolder = "/dbfs/mount_point of/MYFOLDER/"
df.write.
format("com.springml.spark.sftp").
option("host", salesforceHost).
option("username", user).
option("password", pw).
option("fileType", "csv").
option("delimiter", ",").
option("azuremountpoint", tempFolder).
option("templocation", s"abfss://MYFOLDER").
option("gen","gen2").
save(s"SFTP_FOLDER.csv")
Temp folder, was the same as templocation, but azuremountpoint was the mount point of my gen2 temp folder. Using the jar, compiled from the code that I've commited here I could write to sftp, with the code showed above.
I receive the below error when trying to use inside an Azure DataBricks notebook. After some investigation I believe the issue is when it copies the file to the local directory and then reads it again. It seems when writing you need to specify "file:" at the beginning of the path but when reading it you do not need the "file: part. I have managed to get it working using the SFTPClient directly, e.g. this works
`dbutils.fs.cp(partition_path,"file:/tmp/test3.csv")
sftpClient.copyToFTP("/tmp/test3.csv", "/")`
Hence I think to fix it I need another option called something like "writetemplocation"