nchammas / flintrock

A command-line tool for launching Apache Spark clusters.
Apache License 2.0
636 stars 116 forks source link

Checking the connection between the master and slaves. #301

Closed neolunar7 closed 4 years ago

neolunar7 commented 4 years ago

I used flintrock to setup spark ec2 cluster, but I have no idea how to check the connection between the master node and the slave nodes are properly set. Checking conf files of hadoop and spark, master and slaves files seems to identify the ec2 instances. Then, does the spark-defaults.conf file not matter? I added spark.jars.packages and spark.hadoop.fs.s3a.impl to make use of S3 file system, but for other settings, I have no idea.

nchammas commented 4 years ago

I'm not clear on what you mean by "check the connection". Is there something specific you're trying to do that's not working? Are you seeing any errors? etc.

neolunar7 commented 4 years ago

No, not a problem like that. I have another cluster with 1 master and 0 slave. The cluster I set up with flintrock has 1 master and 4 slaves. However, the speed seems almost the same, so I want to check whether the slave nodes are working. Did I make myself clear enough ? Thanks.

nchammas commented 4 years ago

I'm not sure what you mean by "the speed seems the same". But if you navigate to the "Executors" tab in the Spark web UI and see all the cluster nodes in there, then the cluster is working fine.

The speed of any job can be influenced by a million factors. If a larger cluster isn't making your job run faster, then it may be that your data isn't partitioned sufficiently, or that the data is small enough that the cluster overhead overshadows any actual data processing, or that your calculation cannot be distributed across the cluster, etc.

neolunar7 commented 4 years ago

I'll check on that, thanks. One more thing, from README, you only specified spark-submit, but will spark-shell work the same?

nchammas commented 4 years ago

If you're referring to the --packages option, then yes, spark-submit and spark-shell take the same options.

If you have an actual issue with your cluster that you think is related to Flintrock, feel free to open a new issue or update this one. I will reopen it.