Closed rbheemana closed 7 years ago
@rbheemana What version of spark you are using? Is it 1.6.x or 2.x?
2.x
On Jul 12, 2017 at 12:56 AM, <samuel-pt (mailto:notifications@github.com)> wrote:
@rbheemana (https://github.com/rbheemana) What version of spark you are using? Is it 1.6.x or 2.x?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub (https://github.com/springml/spark-sftp/issues/11#issuecomment-314644095), or mute the thread (https://github.com/notifications/unsubscribe-auth/AMEsF_LxxqJubwTUTbxlOavUmH44Wx0fks5sNFHggaJpZM4OPtsb).
@rbheemana I am not able to reproduce this issue. I've tried it in Spark 2.1.0 without hadoop. Please let me know the following details
Hi,
Are you trying this in any notebooks like Jupyter, Zeppelin? I have tried both Zeppelin and command line, it gave me same error.
Do you have Spark on top of hadoop? Yes.
What hadoop distributions you are using? Like HDP, CDH ... I have HDP latest version.
Or do you use any cloud setup like AWS EMR, Google Dataproc or Azure HDI?
My HDP is installed on AWS EC2 RHEL7 instances.
Are you using internal SFTP or SFTP provides like brickftp? I used internal SFTP
Thanks, Ram
@rbheemana Will setup a HDP cluster and update this ticket
@rbheemana When we tried with HDP cluster we got similar issue but not the same. We've fixed it and the changes are committed via https://github.com/springml/spark-sftp/commit/3720389daba7e325b8dfcba1c176c4878b7eddf3. Please try this fix and let us know if you still face this issue.
Please note that this fix is not yet pushed into maven repository. To use this fix, build spark-sftp and include it in spark-shell using --jars option
Thanks for fixing.. It is working perfectly now..
I run Spark SMOTE program from git hub and get the same error . Could you please shed some light on the workaround.
17/11/10 14:10:30 INFO SparkContext: Created broadcast 0 from textFile at loadData.scala:12
[error] (run-main-0) java.lang.IllegalArgumentException: Can not create a Path from an empty string
[error] java.lang.IllegalArgumentException: Can not create a Path from an empty string
[error] at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
[error] at org.apache.hadoop.fs.Path.
[debug] Interrupting remaining threads (should be all daemons). [debug] Interrupting thread New I/O worker #1 [debug] Interrupted New I/O worker #1 [debug] Interrupting thread qtp2016764812-49 Acceptor0 SocketConnector@0.0.0.0:38375 [debug] Interrupted qtp2016764812-49 Acceptor0 SocketConnector@0.0.0.0:38375 [debug] Interrupting thread shuffle-client-0 [debug] Interrupted shuffle-client-0 [debug] Interrupting thread qtp2016764812-56 [debug] Interrupted qtp2016764812-56 [debug] Interrupting thread New I/O boss #3 [debug] Interrupted New I/O boss #3 [debug] Interrupting thread Thread-3 [debug] Interrupted Thread-3 [debug] Interrupting thread qtp1820541255-65 [debug] Interrupted qtp1820541255-65 [debug] Interrupting thread SPARK_CONTEXT cleanup timer [debug] Interrupted SPARK_CONTEXT cleanup timer [debug] Interrupting thread sparkDriver-akka.actor.default-dispatcher-5 [debug] Interrupted sparkDriver-akka.actor.default-dispatcher-5 [debug] Interrupting thread qtp1820541255-60 Acceptor0 SelectChannelConnector@0.0.0.0:4040 [error] java.lang.RuntimeException: Nonzero exit code: 1 [error] at sbt.Run$.executeTrapExit(Run.scala:120) [error] at sbt.Run.run(Run.scala:73) [error] at sbt.Defaults$.$anonfun$bgRunTask$5(Defaults.scala:1152) [error] at sbt.Defaults$.$anonfun$bgRunTask$5$adapted(Defaults.scala:1147) [error] at sbt.internal.BackgroundThreadPool.$anonfun$run$1(DefaultBackgroundJobService.scala:359) [error] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12) [error] at scala.util.Try$.apply(Try.scala:209) [error] at sbt.internal.BackgroundThreadPool$BackgroundRunnable.run(DefaultBackgroundJobService.scala:282) [error] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [error] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [error] at java.lang.Thread.run(Thread.java:748) [error] (compile:run) Nonzero exit code: 1 [error] Total time: 116 s, completed Nov 10, 2017 2:10:36 PM 17/11/10 14:10:37 INFO DiskBlockManager: Shutdown hook called 17/11/10 14:10:37 INFO ShutdownHookManager: Shutdown hook called 17/11/10 14:10:37 INFO ShutdownHookManager: Deleting directory /tmp/spark-73df991b-101f-44df-b410-31db9125cf94
I used hadoop 2.6.0 and Spark 1.5.2 version.
@Cherryko - Could you please provide your code. It will be useful to reproduce the issue. What version spark-salesforce connector you are using?
Hi @springml ,
I am trying to connect to Hive that is present in Google DataProc cluster from R. I am able to establish the connectivity to "local" but when I establish a connectivity to "yarn-cluster", I am getting the below error. Please help me find the route cause.
Error:
Error: java.lang.IllegalArgumentException: Can not create a Path from an empty string
at org.apache.hadoop.fs.Path.checkPathArg(Path.java:163)
at org.apache.hadoop.fs.Path.
The code is throwing java.lang.IllegalArgumentException. Could you please shed some light on the workaround.
Below is the usage
val df = spark.read. format("com.springml.spark.sftp"). option("host", "ftp-server"). option("username", "uname"). option("password", "pwd"). option("fileType", "csv"). option("inferSchema", "true"). option("copyLatest","true"). option("inferSchema", "false"). option("tempLocation", "/tmp"). load("/file-location/file.CSV")
Output: java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:126) at org.apache.hadoop.fs.Path.(Path.java:134)
at org.apache.hadoop.util.StringUtils.stringToPath(StringUtils.java:244)
at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:411)
at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$29.apply(SparkContext.scala:1013)
at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$29.apply(SparkContext.scala:1013)
at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:179)
at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:179)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:179)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:198)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1333)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.take(RDD.scala:1327)
at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1368)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.first(RDD.scala:1367)
at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.findFirstLine(CSVFileFormat.scala:206)
at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.inferSchema(CSVFileFormat.scala:60)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:184)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:184)
at scala.Option.orElse(Option.scala:289)
at org.apache.spark.sql.execution.datasources.DataSource.org$apache$spark$sql$execution$datasources$DataSource$$getOrInferFileFormatSchema(DataSource.scala:183)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:387)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:135)
at com.springml.spark.sftp.DatasetRelation.read(DatasetRelation.scala:44)
at com.springml.spark.sftp.DatasetRelation.(DatasetRelation.scala:29)
at com.springml.spark.sftp.DefaultSource.createRelation(DefaultSource.scala:84)
at com.springml.spark.sftp.DefaultSource.createRelation(DefaultSource.scala:43)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:135)
... 47 elided