Closed ghost closed 6 years ago
It is indeed hard to tell what went wrong here from just that error. What I usually do in cases like this is to instruct Flintrock not to terminate the instances after a failed launch. Then, I login to the master and look at the relevant logs. In this case they'd be under hadoop/logs
or something like that.
Do you have the same issues if you run Flintrock from master
?
In my case, the "HDFS health check failed" was due to the admin box that launched flint in an ec2 vpc being unable to reach the master on port 50070. To workaround this, I first edited the Flintrock cluster group security group in the aws console, and added TCP port 50070 "Anywhere". This had the added bonus of allowing my own browser to reach the :50070 dfs health page. Note that you will need to add port 8080 as well for the spark health check.
Alternatively, adding the Flintrock cluster group security group to the admin box also works.
The Flintrock base group does indeed add and entry to open these ports to the public ip address of the admin box, but because of how ec2 VPC works, the request comes in bound the private ip (internal) address of the admin box.
Thanks for the answers! I have been able to identify and solve the problem: it was exactly what @kavika-1 said. By adding those rules to flintrock security group in the AWS console, now the cluster sets up properly. Cheers!
Thanks for chiming in @kavika-1 and glad you figured it out @DalcaTN.
Do either of you know how the IP address of the inbound request appears in your case? I wouldn't want to add new rules to Flintrock to allow traffic from anywhere, but perhaps adding rules to allow traffic from 10.*
might be appropriate here.
I'm not sure why a private IP address shows up in the first place for you (if you have any insight on this, I'd appreciate it since others have reported similar issues in the past), but as a compromise solution allowing traffic from private addresses should be fine.
I can confirm that the inbound request ip (from the POV of the master) is the private ip of the admin box. The rules get created with the public ip.
My guess is that in our case, @DalcaTN and I probably both have "Auto-assign Public IP: yes" in our vpc subnet. Thus, the instances get two addresses; one private and one public.
I think spark-ec2 attempted to address such issues with the --private-ips flag?
--private-ips Use private IPs for instances rather than public if
VPC/subnet requires that.
Maybe a hint here: https://stackoverflow.com/questions/42654336/how-do-i-resolve-failed-to-determine-hostname-of-instance-error-using-spark-ec
Unfortunately I am not as experienced as @kavika-1, but I can tell you that I did not set Auto-assign Public IP to anything. So if "yes" is the default value, then it is the same for me.
Ah OK. I think the key here is what you pointed out earlier @kavika-1:
the admin box that launched flint in an ec2 vpc
@DalcaTN - Are you also running Flintrock from a box running in EC2?
In this case I wonder if it's better for Flintrock to avoid these kinds of networking issues by querying the healthcheck HTTP endpoints locally from the cluster master, rather than remotely from the Flintrock client.
@nchammas I am not 100% sure about what you mean with box: I am just using Flintrock locally on my laptop to launch the remote cluster on EC2 instances
OK. I thought maybe you were running Flintrock from EC2.
(By "box running in EC2", I just mean an instance running in EC2.)
Hello, as the title says I am unable to launch a cluster using flintrock. I am using:
My configuration file is:
When I try to launch the cluster using flintrock lanuch my-cluster, the output is:
I am having a hard time to understand what the problem is here. Any ideas? Thank you!