Setup test cluster: Cubox 4x4 + ODROID C2

juanrh commented 8 years ago

Setup test cluster with 1 Cubox 4x4 as master and 2 ODROID C2 as slaves. Try run latest Hortonworks HDP on Ubuntu with just YARN installed, and just Ambari for the monitoring (no Ganglia or Nagios), i.e. with the following Ambari blueprint

{
  "host_groups" : [
    {
      "name" : "masters",
      "components" : [
        {
          "name" : "RESOURCEMANAGER"
        },
        {
          "name" : "APP_TIMELINE_SERVER"
        }
      ],
      "cardinality" : "1"
    },
    {
      "name" : "slaves",
      "components" : [
        {
          "name" : "NODEMANAGER"
        }
      ],
      "cardinality" : "*"
    }
  ],
  "configurations" : [
    { 
      "yarn-site" : {
        "yarn.nodemanager.log-dirs" : "/mnt/FIXME/hadoop/yarn/log",
        "yarn.nodemanager.local-dirs" : "/mnt/FIXME/hadoop/yarn/local",
        "yarn.nodemanager.remote-app-log-dir" : "/mnt/FIXME/app-logs",
        "yarn.timeline-service.leveldb-timeline-store.path" : "/mnt/FIXME/hadoop/yarn/timeline"
      }
    }
  ],
  "Blueprints" : {
    "blueprint_name" : "sscheck-SBC-ODROID-C2",
    "stack_name" : "HDP",
    "stack_version" : "2.5"
  } 
}

juanrh commented 8 years ago

See precedent for spark in http://forum.odroid.com/viewtopic.php?f=98&t=21369 and check

problems with big.LITTLE cores for the XU4
configuration for number of cores and trade off with memory: should be possible to extrapolate to YARN config and Flink config

juanrh commented 8 years ago

other interesting precedent http://climbers.net/sbc/40-core-arm-cluster-nanopc-t3/

juanrh commented 8 years ago

According to https://ci.apache.org/projects/flink/flink-docs-release-1.1/setup/yarn_setup.html HDFS is required for running flink on YARN, because flink uses HDFS to distribute the über jar, just like mapreduce. Possible options

run a single node HDFS cluster on the cubox. According to https://ci.apache.org/projects/flink/flink-docs-release-1.1/concepts/concepts.html#distributed-execution and https://ci.apache.org/projects/flink/flink-docs-release-1.1/setup/yarn_setup.html#background--internals, as a resource manager the cubox will not be doing so much work, as the job manager (similar to spark driver) is running in a container (similar to yarn cluster mode in spark, there is no equivalent to yarn client mode in flink). This kills HDFS data locality but makes sense for a flink streaming cluster where that locality is lost anyway for reads from Kafka or other streaming source, although not for check pointing to HDFS
run a multi node HDFS cluster in the same switcher. This might be part of the same HDP installation, just another host group for the data nodes with the cubox as name node. Again makes sense for flink streaming, and it's similar to check pointing to S3
collocate the data nodes with the node managers, it's the best for batch but that is putting more load on the odroids

The first option sounds the best for a first approach. The first cluster should ideally have 1 cubox and 3 odroids to have one master and 2 slave nodes, but with 2 odroids we might have 4 containers of 700 MB approx

juanrh commented 8 years ago

For Spark, if we have a separate head node, then the driver would run either in the head node (yarn client mode) or a container in a o-droid slave (yarn cluster mode). In any way the cubox would not be executing computations, so a proof of concept with the cubox as ResourceManager (compute master), NodeManager (data master) and also the only DataNode (data slave), still makes sense. Future setups could include:

dedicated o-droid for HDFS DataNodes: TBD if collocating NodeManagers and DataNodes in a single o-droid gives good performance. These data nodes should ideally have at least 64GB of eMMC storage, consider adding additional disk as SSD cards, node external USB disk would be quite slow
dedicated o-droid for a Kafka Cluster: consider using KOYA on Slider to run this on the same YARN cluster, with low retention using the eMMC storage in the o-droids sounds promising

juanrh / sscheck

Setup test cluster: Cubox 4x4 + ODROID C2 #50