springml / spark-sftp

Spark connector for SFTP
Apache License 2.0
100 stars 98 forks source link

Getting Path Does Not Exist when loading file from sftp #52

Open bennyblum opened 5 years ago

bennyblum commented 5 years ago

possibly related to issue 24. Connection is being made but spark is not able to locate the file in temp directory on dbfs (hdfs).

Running spark 2.4 in Databricks runtime 5.2. Installed package com.springml:spark-sftp_2.11:1.1.3 via maven coordinates.

code being executed: val df = spark.read.format("com.springml.spark.sftp") .option("host", HOST) .option("port", PORT) .option("username", UN) .option("password", PWD) .option("fileType", "csv") .option("inferSchema", "true") .option("header", "true") .load(FILENAME)

error response: org.apache.spark.sql.AnalysisException: Path does not exist: dbfs:/local_disk0/tmp/FILENAME

complete stack trace springml_spark_sftp_stacktrace.txt

samuel-pt commented 5 years ago

@bennyblum

Can you provide temp folder as a parameter ? You can use "tempLocation" parameter for pass the tempFolder location

zidear commented 5 years ago

error info was org.apache.spark.sql.AnalysisException: Path does not exist: dbfs:/dbfs/tmp/tmp.csv; when set option("tempLocation", "/dbfs/tmp/") But error turned to java.io.FileNotFoundException: /tmp/tmp.csv (No such file or directory) if set option("tempLocation", "/tmp/")

obviously it can not correctly process spark path and local path when download file and read file I think. You can use option("createDF", "false") to download it first and then spark.read to get DF.

harshpreet0904 commented 4 years ago

Hi Sir/Mam, I am facing the same issue with spark-sftp version - com.springml:spark-sftp_2.11:1.1.5. Are you able to fix it ?

abhinavdangi commented 3 years ago

Even I am facing the same issue with spark-sftp version - com.springml:spark-sftp_2.11:1.1.0

shaikmanu797 commented 3 years ago

@harshpreet0904 / @abhinavdangi,

Could you try my implementation of the spark SFTP package to see if your workload run?

https://github.com/arcizon/spark-filetransfer

abhinavdangi commented 3 years ago

@shaikmanu797, would it be possible for you to add it as a patch in this repo?

shaikmanu797 commented 3 years ago

@shaikmanu797, would it be possible for you to add it as a patch in this repo?

@abhinavdangi I am not part of developer group for this organization / repo, therefore I don't have write access.

I have an open PR on this repo for about 18months now and no one from the org reviewed it yet which led me to go with my own implemention considering this repo to be inactive.