Closed cwithrowss closed 6 years ago
What do you see in the Hadoop logs if you don't terminate the cluster and flintrock login
to the master? It should give you a clue as to why the master is failing to come up. Or maybe the HDFS master is coming up fine and it's just that Flintrock is unable to reach it via its web API for a health check.
Thanks for your help. The logs are there, but there's lots of errors! Some are connection related, but I'd like to share the first one, and it confuses me the most:
2018-07-02 22:20:56,528 WARN [main] namenode.FileJournalManager (FileJournalManager.java:startLogSegment(129)) - - Unable to start log segment 1 at /media/ephemeral0/hadoop/dfs/name/current/edits_inprogress_0000000000000000001: No space left on device
2018-07-02 22:20:56,528 ERROR [main] common.Storage (NNStorage.java:reportErrorsOnDirectory(850)) - Error reported on storage diirectory Storage Directory /media/ephemeral0/hadoop/dfs/name
2018-07-02 22:20:56,528 WARN [main] common.Storage (NNStorage.java:reportErrorsOnDirectory(855)) - About to remove corresponding storage: /media/ephemeral0/hadoop/dfs/name
2018-07-02 22:20:56,529 ERROR [main] namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(410)) - Error: starting log segment 1 failed for (journal JournalAndStream(mgr=FileJournalManager(root=/media/ephemeral0/hadoop/dfs/name), stream=null))
java.io.IOException: No space left on device
I don't know if this means it's not trying to store in correct space, or what. Not to go into a potentially separate issue- I'm just not sure how to confirm from logs if the HDFS did come up.
That's a good lead.
What instance type and AMI ID are you trying to launch with?
instance-type: m5.large ami: ami-97785bed # Amazon Linux, us-east-1
Ah, that's why. Try m3.large
instead.
For some reason, m5.large
exposes a tiny 1 MB ephemeral drive, which Flintrock then automatically tries to use for HDFS. I'll have to investigate the best way to handle this. If an instance does not have ephemeral storage, then Flintrock defaults to using the root EBS volume.
Thanks! working now- I had used m5.large because the AWS site said m3 was deprecated, but I see it's still supported.
Great! I'll keep this issue open so I can fix this issue with m5 instances, because Flintrock should ideally work fine with those too.
I have been unable to launch any clusters with working hdfs. I can launch clusters without installing hadoop and have tried with several apache mirror -hadoop direct downloads, which seem to download ok in browser, to no avail. After it purportedly installs HDFS and Spark, when it tries to configure HDFS master it times out waiting for it to come up. This happens every time, and I'm only launching with 1 slave.
I do not have a lot of dev experience so I apologize if I am reporting this poorly.
debug info: