springml / spark-sftp

Spark connector for SFTP
Apache License 2.0
100 stars 98 forks source link

Exception in thread "main" java.lang.NoSuchMethodError: com.springml.sftp.client.SFTPClient #46

Closed nurzhannogerbek closed 5 years ago

nurzhannogerbek commented 5 years ago

In my Spark application I need to convertDataFrame to .csv file and put it to remote SFTP server. I decided to use spark-sftp library for this task.

My sbt file looks like this:

scalaVersion := "2.11.8"

val sparkVersion = "2.3.0"

resolvers += "Spark Packages Repo" at "https://dl.bintray.com/spark-packages/maven"

libraryDependencies += "com.springml" % "spark-sftp_2.11" % "1.1.3"

I compile sbt file without any error. When I try to test next code it raise error.

import spark.sqlContext.implicits._

val df: DataFrame  = Seq(
  ("Alex", "2018-01-01 00:00:00", "2018-02-01 00:00:00", "OUT"),
  ("Bob", "2018-02-01 00:00:00", "2018-02-05 00:00:00", "IN"),
  ("Mark", "2018-02-01 00:00:00", "2018-03-01 00:00:00", "IN"),
  ("Mark", "2018-05-01 00:00:00", "2018-08-01 00:00:00", "OUT"),
  ("Meggy", "2018-02-01 00:00:00", "2018-02-01 00:00:00", "OUT")
).toDF("NAME", "START_DATE", "END_DATE", "STATUS")

println("Count: " + df.count()) // Next command show in console: 5

// Next part of code raise error
df.write.format("com.springml.spark.sftp")
  .option("host", "XXXX")
  .option("username", "XXXX")
  .option("password", "XXXX")
  .option("fileType", "csv")
  .option("delimiter", ";")
  .option("codec", "bzip2")
  .save("/reports/daily.csv")

ERROR:

Exception in thread "main" java.lang.NoSuchMethodError: com.springml.sftp.client.SFTPClient.<init>(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;I)V
        at com.springml.spark.sftp.DefaultSource.getSFTPClient(DefaultSource.scala:186)
        at com.springml.spark.sftp.DefaultSource.createRelation(DefaultSource.scala:122)
        at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
        at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
        at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
        at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
        at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
        at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
        at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:668)
        at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:276)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:270)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:228)
        at report.CALL.runTask(CALL.scala:42)
        at JobController.runJob(JobController.scala:38)
        at MainApp$.main(MainApp.scala:74)
        at MainApp.main(MainApp.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

What the reason of the problem?

samuel-pt commented 5 years ago

@NogerbekNurzhan

Could you please try cleaning "com.springml" % "sftp.client" % "1.0.3" from local repository and run the code again?

nurzhannogerbek commented 5 years ago

@samuel-pt Thank you for your answer! I removed com.springml.sftp.client library as you recommended. Also I removed codec option. Final code looks like this:

df.write.format("com.springml.spark.sftp")
  .option("host", "XXXX")
  .option("username", "XXXX")
  .option("password", "XXXX")
  .option("fileType", "csv")
  .option("delimiter", ";")
  .option("charset", "windows-1251")
  .mode("append")
  .save("/reports/daily.csv")

After rebuilding the project I notice that sbt reinstall com.springml.sftp.client library. I run the project without previous error now.

I notice only one problem. For some reason non-Latin letters (in my case cyrillic) are displayed incorrectly in final csv file. Inside that csv file I see strange symbols: Село. Why charset option don't work?

nurzhannogerbek commented 5 years ago

@samuel-pt as I understand from your comment from this post, the charset is not supported right now, right?! Are there any ideas to add this feature in future releases? As I understand you just need to add this option here.

samuel-pt commented 5 years ago

@NogerbekNurzhan - Sorry, we got into something more priority. If possible can you create PR with the required changes?