springml / spark-sftp

Spark connector for SFTP
Apache License 2.0
100 stars 98 forks source link

Loading files using wildcard fail #53

Closed SamambaMan closed 5 years ago

SamambaMan commented 5 years ago

I'm trying to load txt files like the following:

In [9]: data = spark.read.\
   ...:             format("com.springml.spark.sftp").\
   ...:             option("host", "---").\
   ...:             option("username", "---").\
   ...:             option("password", "---").\
   ...:             option("fileType", "csv").\
   ...:             option("delimiter", "|").\
   ...:             option("inferSchema", "true").\
   ...:             option("charset", "ISO-8859-1").\
   ...:             load("CIVIL/CIVIL_GERA_MP_*.TXT")

But receiving error: Py4JJavaError: An error occurred while calling o114.load. : 4: Copying multiple files, but destination is missing or a file. at com.jcraft.jsch.ChannelSftp.get(ChannelSftp.java:780) at com.jcraft.jsch.ChannelSftp.get(ChannelSftp.java:750) at com.springml.sftp.client.SFTPClient.copyInternal(SFTPClient.java:168) at com.springml.sftp.client.SFTPClient.copy(SFTPClient.java:74) at com.springml.spark.sftp.DefaultSource.copy(DefaultSource.scala:212) at com.springml.spark.sftp.DefaultSource.createRelation(DefaultSource.scala:80) at com.springml.spark.sftp.DefaultSource.createRelation(DefaultSource.scala:41) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:340) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:174) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:748)

Is it possible to load multiple files at once with wildcards?

samuel-pt commented 5 years ago

@SamambaMan Wildcard is not supported. But you can pass the folder as path which will import all the files from SFTP