springml / spark-sftp

Spark connector for SFTP
Apache License 2.0
100 stars 98 forks source link

File download Problem in spark executor in cluster mode #47

Open rabi112 opened 5 years ago

rabi112 commented 5 years ago

In the case of Spark Driver and Executor is in the same system then file downloaded successfully to /tmp/ location and executor also get that file. But in the case of the executor in a different system then file not downloaded in executor but successfully download in the driver. We are getting the exception in executor FileScanRDD:54 - Reading File path: file:///tmp/1546428988_Monthly_salary_csv.csv, range: 0-125929, partition values: [empty row] 2019-02-12 07:32:18 ERROR Executor:91 - Exception in task 0.3 in stage 56.0 (TID 85) java.io.FileNotFoundException: File file:/tmp/1546428988_Monthly_salary_csv.csv does not exist

samuel-pt commented 5 years ago

@bini0209 Can you provide temp folder as a parameter ? You can use "tempLocation" parameter for pass the tempFolder location

mbidewell commented 4 years ago

I'm seeing this problem as well, I've set tempLocation. The issue is that for some reason the file is downloaded to the driver so that when the load attempts to create the dataframe on the cluster, the file is not found.