Following up on @heathermiller's comments here, I think this fixes s3a:// for folks using spark-submit --deploy-mode cluster from a remote client.
It's difficult to come up with a completely seamless way to setup s3a:// due to all the incompatibilities you have to work around, but I think that instructing users to use --packages ...:hadoop-aws is the best I can do at this time at the intersection of "easy to maintain" and "works for the user".
I don't have a Scala toolchain setup on my machine, so I couldn't directly confirm that this eliminates the need to manually copy AWS-related jars, as described here, and PySpark does not support cluster deploy mode with a standalone master, so I couldn't test it that way either. But I'd expect this to work since it works when I interactively start a PySpark shell from the master.
@heathermiller - If you have the time to test that calling spark-submit --deploy-mode cluster --packages ...:hadoop-aws eliminates the need to manually install hadoop-aws-2.7.2.jar and aws-java-sdk-1.7.4.jar on the cluster, that would be great. No worries otherwise.
Following up on @heathermiller's comments here, I think this fixes
s3a://
for folks usingspark-submit --deploy-mode cluster
from a remote client.It's difficult to come up with a completely seamless way to setup
s3a://
due to all the incompatibilities you have to work around, but I think that instructing users to use--packages ...:hadoop-aws
is the best I can do at this time at the intersection of "easy to maintain" and "works for the user".I don't have a Scala toolchain setup on my machine, so I couldn't directly confirm that this eliminates the need to manually copy AWS-related jars, as described here, and PySpark does not support cluster deploy mode with a standalone master, so I couldn't test it that way either. But I'd expect this to work since it works when I interactively start a PySpark shell from the master.
@heathermiller - If you have the time to test that calling
spark-submit --deploy-mode cluster --packages ...:hadoop-aws
eliminates the need to manually install hadoop-aws-2.7.2.jar
andaws-java-sdk-1.7.4.jar
on the cluster, that would be great. No worries otherwise.Related to #180.