juanrh / sscheck

ScalaCheck for Spark
Apache License 2.0
63 stars 9 forks source link

Setup test cluster: Cubox 4x4 + ODROID C2 #50

Open juanrh opened 8 years ago

juanrh commented 8 years ago

Setup test cluster with 1 Cubox 4x4 as master and 2 ODROID C2 as slaves. Try run latest Hortonworks HDP on Ubuntu with just YARN installed, and just Ambari for the monitoring (no Ganglia or Nagios), i.e. with the following Ambari blueprint

{
  "host_groups" : [
    {
      "name" : "masters",
      "components" : [
        {
          "name" : "RESOURCEMANAGER"
        },
        {
          "name" : "APP_TIMELINE_SERVER"
        }
      ],
      "cardinality" : "1"
    },
    {
      "name" : "slaves",
      "components" : [
        {
          "name" : "NODEMANAGER"
        }
      ],
      "cardinality" : "*"
    }
  ],
  "configurations" : [
    { 
      "yarn-site" : {
        "yarn.nodemanager.log-dirs" : "/mnt/FIXME/hadoop/yarn/log",
        "yarn.nodemanager.local-dirs" : "/mnt/FIXME/hadoop/yarn/local",
        "yarn.nodemanager.remote-app-log-dir" : "/mnt/FIXME/app-logs",
        "yarn.timeline-service.leveldb-timeline-store.path" : "/mnt/FIXME/hadoop/yarn/timeline"
      }
    }
  ],
  "Blueprints" : {
    "blueprint_name" : "sscheck-SBC-ODROID-C2",
    "stack_name" : "HDP",
    "stack_version" : "2.5"
  } 
}
juanrh commented 8 years ago

See precedent for spark in http://forum.odroid.com/viewtopic.php?f=98&t=21369 and check

juanrh commented 8 years ago

other interesting precedent http://climbers.net/sbc/40-core-arm-cluster-nanopc-t3/

juanrh commented 8 years ago

According to https://ci.apache.org/projects/flink/flink-docs-release-1.1/setup/yarn_setup.html HDFS is required for running flink on YARN, because flink uses HDFS to distribute the über jar, just like mapreduce. Possible options

The first option sounds the best for a first approach. The first cluster should ideally have 1 cubox and 3 odroids to have one master and 2 slave nodes, but with 2 odroids we might have 4 containers of 700 MB approx

juanrh commented 8 years ago

For Spark, if we have a separate head node, then the driver would run either in the head node (yarn client mode) or a container in a o-droid slave (yarn cluster mode). In any way the cubox would not be executing computations, so a proof of concept with the cubox as ResourceManager (compute master), NodeManager (data master) and also the only DataNode (data slave), still makes sense. Future setups could include: