nchammas / flintrock

A command-line tool for launching Apache Spark clusters.
Apache License 2.0
636 stars 116 forks source link

Configuring HDFS Master timeout #292

Closed 13k75 closed 4 years ago

13k75 commented 4 years ago

Hi Nicholas,

When starting my cluster, the HDFS configuration times out. Unlike the previous issue about m5.large's though, the Hadoop logs don't show anything amiss: the NameNode and SecondaryNameNode are starting and stopping normally.

Here is my config file:

services:
  spark:
    version: 2.4.4

  hdfs:
    version: 3.1.2

provider: ec2

providers:
  ec2:
    key-name: spark_cluster
    identity-file: /home/kasra/distributed-setup/spark_cluster.pem
    instance-type: t2.micro
    region: us-west-2
    ami: ami-04b762b4289fba92b # amazon linux 2
    user: ec2-user
    tenancy: default  # default | dedicated
    ebs-optimized: no  # yes | no
    instance-initiated-shutdown-behavior: terminate  # terminate | stop

launch:
  num-slaves: 1
  install-hdfs: True
  install-spark: True

debug: true

And I'm happy to provide the Hadoop logs too if you want them, though like I said they don't show any errors or warnings.

I would appreciate any help or insight you might have. Thanks!

nchammas commented 4 years ago

I don't know that Spark will work out of the box with Hadoop 3+. I would stick to Flintrock's default of Hadoop 2.8.5 and see if you still have any issues.

13k75 commented 4 years ago

Yes, that's exactly it! Hadoop 3+ switches a bunch of ports. In particular, instead of 50070 it's 9870.