springml / spark-sftp

Spark connector for SFTP
Apache License 2.0
100 stars 98 forks source link

Add support for WASB(s) (Azure Blobs) scheme. #58

Open nperumal opened 5 years ago

nperumal commented 5 years ago

It would be nice to have support for WASB(s) scheme.

FurcyPin commented 4 years ago

We identified that the problem is that code currently checks if the FileSystem.getScheme() is "hdfs" to know if it is running in local or distributed mode.

However, when running on cloud distributions (AWS EMR, Azure HDInsight, Google DataProc, and probably Databricks too) the scheme is not "hdfs" (but instead "s3", "wasbs", "gs", etc.).

The above pull requests fixes that. (We only tested it on WASB(s) but we believe it should fix the issue for other clouds as well).