pishen / sbt-lighter

SBT plugin for Apache Spark on AWS EMR
Apache License 2.0
57 stars 15 forks source link

Cluster never appears. #20

Closed vasion closed 7 years ago

vasion commented 7 years ago

I am trying to spin up a long running cluster.

I get this:

info] Your new cluster's id is j-V9DQSH3QKAI2, you may check its status on AWS console.
[success] Total time: 2 s, completed Oct 4, 2017 11:20:52 AM
[info] Found cluster j-V9DQSH3QKAI2, start monitoring.
....
[info] Cluster terminated without error.
[success] Total time: 17 s, completed Oct 4, 2017 11:21:09 AM 

But the cluster is nowhere to be found in the aws console. It seems like it's not spinning up at all.

Can you point me in the right direction for debugging this?

pishen commented 7 years ago

@vasion Could you show me the command you executed? Also, could you check that your terminal is using the same identity as the one you used to login aws console? By default EMR will only show the cluster information to the person who created it.

vasion commented 7 years ago

Here is the command. sbt 'sparkCreateCluster' sparkMonitor

The account in aws is the same as for the credentials. I see other clusters my colleagues have created. Btw the Cluster terminated without error appears very fast, idk if this is a clue.

Here is the relevant part of my sbt

libraryDependencies ++= Seq( "com.typesafe.scala-logging" %% "scala-logging" % "3.7.2" % "provided", "org.apache.spark" %% "spark-sql" % "2.2.0" % "provided", "org.apache.spark" %% "spark-streaming" % "2.2.0" % "provided", "org.apache.spark" %% "spark-mllib" % "2.2.0" % "provided", "org.apache.hadoop" % "hadoop-aws" % "2.8.1" % "provided", "com.amazonaws" % "aws-java-sdk" % "1.11.194" % "provided" "org.scalatest" %% "scalatest" % "3.0.3" % "provided", "org.scalamock" %% "scalamock-scalatest-support" % "3.6.0" % "provided" )

//EMR setup for command line clusters

val username = System.getProperty("user.name")

sparkAwsRegion := "us-east-1"

sparkJobFlowInstancesConfig := sparkJobFlowInstancesConfig.value.withEc2KeyName("xxxxxx")

sparkSecurityGroupIds := Some(Seq("sg-xxxxxxx"))

sparkS3JarFolder := s"s3://xxxxxxxxx/$username/"

sparkInstanceCount := 3

sparkClusterName := s"SBTCluster$username"

sparkEmrRelease := "emr-5.8.0"

sparkInstanceType := "m3.xlarge"

//comment away for on-demand cluster //sparkInstanceBidPrice := Some("0.10")

import com.amazonaws.services.elasticmapreduce.model.Application val applications = Seq("Spark", "Ganglia", "Hadoop").map(a =>new Application().withName(a)) sparkRunJobFlowRequest := sparkRunJobFlowRequest.value.withApplications(applications:_*)

import com.amazonaws.services.elasticmapreduce.model.Tag sparkRunJobFlowRequest := sparkRunJobFlowRequest.value.withTags(new Tag("Name", s"SBTCluster-$username"))

sparkRunJobFlowRequest := sparkRunJobFlowRequest.value.withLogUri(s"s3://xxxxxxxxxx/$username/")

pishen commented 7 years ago

@vasion Could you try to add this to your build.sbt:

sparkRunJobFlowRequest := sparkRunJobFlowRequest.value.withVisibleToAllUsers(true)
vasion commented 7 years ago

Yay. I can see it. And there are errors. I will debug tomorrow and write back.

pishen commented 7 years ago

@vasion If you can see it after adding withVisibleToAllUsers, it should be the identity problem, maybe you can check if your AWS_ACCESS_KEY_ID is owned by the IAM User you used to log in AWS web console.

vasion commented 7 years ago

withVisibleToAllUsers allowed my to see the cluster. The reason it was failing was that the security group I was specifying was in another region. Thanks for your help. I really appreciate it.