Closed neolunar7 closed 4 years ago
I'm not clear on what you mean by "check the connection". Is there something specific you're trying to do that's not working? Are you seeing any errors? etc.
No, not a problem like that. I have another cluster with 1 master and 0 slave. The cluster I set up with flintrock has 1 master and 4 slaves. However, the speed seems almost the same, so I want to check whether the slave nodes are working. Did I make myself clear enough ? Thanks.
I'm not sure what you mean by "the speed seems the same". But if you navigate to the "Executors" tab in the Spark web UI and see all the cluster nodes in there, then the cluster is working fine.
The speed of any job can be influenced by a million factors. If a larger cluster isn't making your job run faster, then it may be that your data isn't partitioned sufficiently, or that the data is small enough that the cluster overhead overshadows any actual data processing, or that your calculation cannot be distributed across the cluster, etc.
I'll check on that, thanks. One more thing, from README, you only specified spark-submit, but will spark-shell work the same?
If you're referring to the --packages
option, then yes, spark-submit
and spark-shell
take the same options.
If you have an actual issue with your cluster that you think is related to Flintrock, feel free to open a new issue or update this one. I will reopen it.
I used flintrock to setup spark ec2 cluster, but I have no idea how to check the connection between the master node and the slave nodes are properly set. Checking conf files of hadoop and spark, master and slaves files seems to identify the ec2 instances. Then, does the spark-defaults.conf file not matter? I added spark.jars.packages and spark.hadoop.fs.s3a.impl to make use of S3 file system, but for other settings, I have no idea.