mesosphere-backup / hdfs-deprecated

[DEPRECATED] This project is deprecated. It will be archived on December 1, 2017.
Apache License 2.0
147 stars 52 forks source link

Journal Node's task lost when launching #194

Closed F21 closed 8 years ago

F21 commented 9 years ago

I have 4 mesos 0.23 nodes with 1 master/slave and 3 slaves running via vagrant. I am using Java 8 and running everything natively on CoreOS 794.0.0 (not inside docker containers).

I have build the HEAD of the hdfs framework and launched it using marathon 0.10.1.

I noticed that after deploying the scheduler via marathon, it attempts to launch a JournalNode on one of my slaves. However, the JournalNode task will become lost and another attempt to launch the JournalNode starts again, until I kill the scheduler.

This is my mesos-site.xml:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <property>
    <name>mesos.hdfs.data.dir</name>
    <description>The primary data directory in HDFS</description>
    <value>/var/lib/hdfs/data</value>
  </property>

  <property>
    <name>mesos.hdfs.secondary.data.dir</name>
    <description>The secondary data directory in HDFS</description>
    <value>/var/run/hadoop-hdfs</value>
  </property>

  <property>
    <name>mesos.hdfs.native-hadoop-binaries</name>
    <description>Mark true if you have hadoop pre-installed on your host machines (otherwise it will be distributed by the scheduler)</description>
    <value>false</value>
  </property>

  <property>
    <name>mesos.hdfs.framework.mnt.path</name>
    <description>Mount location (if mesos.hdfs.native-hadoop-binaries is marked false)</description>
    <value>/opt/mesosphere</value>
  </property>

  <property>
    <name>mesos.hdfs.state.zk</name>
    <description>Comma-separated hostname-port pairs of zookeeper node locations for HDFS framework state information</description>
    <value>master.mesos:2181</value>
  </property>

  <property>
    <name>mesos.master.uri</name>
    <description>Zookeeper entry for mesos master location</description>
    <value>zk://master.mesos:2181/mesos</value>
  </property>

  <property>
    <name>mesos.hdfs.zkfc.ha.zookeeper.quorum</name>
    <description>Comma-separated list of zookeeper hostname-port pairs for HDFS HA features</description>
    <value>master.mesos:2181</value>
  </property>

  <property>
    <name>mesos.hdfs.framework.name</name>
    <description>Your Mesos framework name and cluster name when accessing files (hdfs://YOUR_NAME)</description>
    <value>hdfs</value>
  </property>

  <property>
    <name>mesos.hdfs.mesosdns</name>
    <description>Whether to use Mesos DNS for service discovery within HDFS</description>
    <value>true</value>
  </property>

  <property>
    <name>mesos.hdfs.mesosdns.domain</name>
    <description>Root domain name of Mesos DNS (usually 'mesos')</description>
    <value>mesos</value>
  </property>

  <property>
    <name>mesos.native.library</name>
    <description>Location of libmesos.so</description>
    <value>/opt/test/packages/mesos/lib/libmesos.so</value>
  </property>

  <property>
    <name>mesos.hdfs.journalnode.count</name>
    <description>Number of journal nodes (must be odd)</description>
    <value>1</value>
  </property>

  <!-- Additional settings for fine-tuning -->
  <property>
    <name>mesos.hdfs.jvm.overhead</name>
    <description>Multiplier on resources reserved in order to account for JVM allocation</description>
    <value>1</value>
  </property>

  <property>
    <name>mesos.hdfs.hadoop.heap.size</name>
    <value>256</value>
  </property>

  <property>
    <name>mesos.hdfs.namenode.heap.size</name>
    <value>256</value>
  </property>

  <property>
    <name>mesos.hdfs.datanode.heap.size</name>
    <value>256</value>
  </property>

  <property>
    <name>mesos.hdfs.executor.heap.size</name>
    <value>256</value>
  </property>

  <property>
    <name>mesos.hdfs.executor.cpus</name>
    <value>0.5</value>
  </property>

  <property>
    <name>mesos.hdfs.namenode.cpus</name>
    <value>0.5</value>
  </property>

  <property>
    <name>mesos.hdfs.journalnode.cpus</name>
    <value>0.5</value>
  </property>

  <property>
    <name>mesos.hdfs.datanode.cpus</name>
    <value>0.5</value>
  </property>

  <property>
    <name>mesos.hdfs.user</name>
    <value>root</value>
  </property>

  <property>
    <name>mesos.hdfs.role</name>
    <value>*</value>
  </property>
</configuration>

This is the stderr of the scheduler:

Registered executor on 192.168.33.12
Starting task hdfs.b3f3e561-55c2-11e5-9700-080027530cf6
sh -c 'cd hdfs-mesos-0.1.3 && ./bin/hdfs-mesos'
Forked command at 1397
00:48:14.502 [HdfsScheduler] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Registering without authentication
00:48:14.512 [main] INFO  org.eclipse.jetty.util.log - Logging initialized @1394ms
00:48:14.572 [main] INFO  org.eclipse.jetty.server.Server - jetty-9.2.z-SNAPSHOT
00:48:14.599 [Thread-2] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Registered framework frameworkId=20150908-001609-169978048-5050-1255-0001
00:48:14.600 [Thread-2] INFO  o.apache.mesos.hdfs.state.LiveState - Acquisition phase is already 'RECONCILING_TASKS'
00:48:14.629 [main] INFO  o.e.jetty.server.ServerConnector - Started ServerConnector@6b19b79{HTTP/1.1}{0.0.0.0:31541}
00:48:14.630 [main] INFO  org.eclipse.jetty.server.Server - Started @1521ms
00:48:14.642 [Thread-13] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 4 offers
00:48:14.655 [Thread-13] INFO  o.apache.mesos.hdfs.state.LiveState - Transitioning from acquisition phase 'RECONCILING_TASKS' to 'JOURNAL_NODES'
00:48:14.661 [Thread-13] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:48:14.675 [Thread-13] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Launching node of type journalnode with tasks [journalnode]
00:48:20.386 [Thread-14] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 4 offers
00:48:20.416 [Thread-14] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:48:20.423 [Thread-14] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:48:20.429 [Thread-14] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:48:20.436 [Thread-14] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:48:20.456 [Thread-14] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:48:20.470 [Thread-14] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:48:20.480 [Thread-14] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:48:20.486 [Thread-14] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:48:26.428 [Thread-15] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
00:48:26.435 [Thread-15] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:48:26.475 [Thread-15] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:48:26.479 [Thread-15] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:48:26.483 [Thread-15] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:48:27.436 [Thread-16] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
00:48:27.447 [Thread-16] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:48:27.513 [Thread-16] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:48:27.555 [Thread-16] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:48:27.565 [Thread-16] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:48:32.477 [Thread-17] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
00:48:32.484 [Thread-17] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:48:32.496 [Thread-17] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:48:32.546 [Thread-17] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:48:32.580 [Thread-17] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:48:33.494 [Thread-18] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
00:48:33.507 [Thread-18] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:48:33.512 [Thread-18] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:48:33.525 [Thread-18] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:48:33.536 [Thread-18] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:48:38.525 [Thread-19] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
00:48:38.535 [Thread-19] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:48:38.543 [Thread-19] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:48:38.551 [Thread-19] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:48:38.555 [Thread-19] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:48:39.534 [Thread-20] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
00:48:39.539 [Thread-20] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:48:39.546 [Thread-20] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:48:39.551 [Thread-20] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:48:39.557 [Thread-20] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:48:44.573 [Thread-21] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
00:48:44.579 [Thread-21] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:48:44.591 [Thread-21] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:48:44.596 [Thread-21] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:48:44.602 [Thread-21] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:48:44.608 [Thread-21] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:48:44.615 [Thread-21] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:48:45.578 [Thread-22] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:48:45.588 [Thread-22] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:48:45.594 [Thread-22] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:48:50.606 [Thread-23] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
00:48:50.625 [Thread-23] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:48:50.636 [Thread-23] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:48:50.652 [Thread-23] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:48:50.658 [Thread-23] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:48:51.626 [Thread-24] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
00:48:51.652 [Thread-24] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:48:51.656 [Thread-24] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:48:51.670 [Thread-24] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:48:51.682 [Thread-24] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:48:56.653 [Thread-25] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:48:56.665 [Thread-25] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:48:56.670 [Thread-25] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:48:57.661 [Thread-26] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
00:48:57.674 [Thread-26] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:48:57.682 [Thread-26] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:48:57.688 [Thread-26] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:48:57.696 [Thread-26] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:48:57.704 [Thread-26] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:48:57.708 [Thread-26] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:49:01.688 [Thread-27] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:49:01.694 [Thread-27] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:49:01.698 [Thread-27] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:49:03.710 [Thread-28] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
00:49:03.714 [Thread-28] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:49:03.724 [Thread-28] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:49:03.738 [Thread-28] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:49:03.751 [Thread-28] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:49:03.769 [Thread-28] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:49:03.774 [Thread-28] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:49:06.731 [Thread-29] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:49:06.737 [Thread-29] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:49:06.744 [Thread-29] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:49:09.752 [Thread-30] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
00:49:09.757 [Thread-30] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:49:09.765 [Thread-30] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:49:09.942 [Thread-30] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:49:09.950 [Thread-30] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:49:10.754 [Thread-31] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:49:10.795 [Thread-31] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:49:10.804 [Thread-31] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:49:11.759 [Thread-32] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:49:11.767 [Thread-32] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:49:11.773 [Thread-32] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:49:15.793 [Thread-33] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
00:49:15.803 [Thread-33] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:49:15.812 [Thread-33] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:49:15.824 [Thread-33] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:49:15.835 [Thread-33] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:49:16.814 [Thread-34] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
00:49:16.821 [Thread-34] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:49:16.834 [Thread-34] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:49:16.838 [Thread-34] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:49:16.845 [Thread-34] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:49:20.851 [Thread-35] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:49:20.856 [Thread-35] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:49:20.859 [Thread-35] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:49:21.855 [Thread-36] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:49:21.859 [Thread-36] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:49:21.863 [Thread-36] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:49:22.861 [Thread-37] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:49:22.868 [Thread-37] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:49:22.877 [Thread-37] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:49:23.872 [Thread-38] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:49:23.879 [Thread-38] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:49:23.888 [Thread-38] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:49:25.878 [Thread-39] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:49:25.882 [Thread-39] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:49:25.896 [Thread-39] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:49:26.891 [Thread-40] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:49:26.903 [Thread-40] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:49:26.908 [Thread-40] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:49:27.898 [Thread-41] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:49:27.926 [Thread-41] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:49:27.932 [Thread-41] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:49:28.903 [Thread-42] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:49:28.908 [Thread-42] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:49:28.912 [Thread-42] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:49:30.915 [Thread-43] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:49:30.920 [Thread-43] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:49:30.929 [Thread-43] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:49:31.918 [Thread-44] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:49:31.922 [Thread-44] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:49:31.925 [Thread-44] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:49:33.935 [Thread-45] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
00:49:33.941 [Thread-45] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:49:33.954 [Thread-45] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:49:33.962 [Thread-45] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:49:33.975 [Thread-45] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:49:35.947 [Thread-46] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:49:35.953 [Thread-46] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:49:35.961 [Thread-46] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:49:36.953 [Thread-47] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:49:36.964 [Thread-47] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:49:36.969 [Thread-47] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:49:39.973 [Thread-48] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:49:39.981 [Thread-48] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:49:39.988 [Thread-48] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:49:40.977 [Thread-49] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
00:49:40.980 [Thread-49] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:49:40.983 [Thread-49] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:49:40.987 [Thread-49] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:49:40.990 [Thread-49] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:49:41.986 [Thread-50] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:49:41.993 [Thread-50] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:49:41.997 [Thread-50] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:49:46.007 [Thread-51] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
00:49:46.042 [Thread-51] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:49:46.049 [Thread-51] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:49:46.054 [Thread-51] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:49:46.057 [Thread-51] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:49:46.062 [Thread-51] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:49:46.067 [Thread-51] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:49:47.016 [Thread-52] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:49:47.021 [Thread-52] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:49:47.038 [Thread-52] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:49:52.049 [Thread-53] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
00:49:52.068 [Thread-53] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:49:52.076 [Thread-53] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:49:52.082 [Thread-53] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:49:52.120 [Thread-53] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:49:53.054 [Thread-54] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
00:49:53.077 [Thread-54] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:49:53.080 [Thread-54] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:49:53.085 [Thread-54] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:49:53.089 [Thread-54] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:49:58.084 [Thread-55] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
00:49:58.089 [Thread-55] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:49:58.093 [Thread-55] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:49:58.096 [Thread-55] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:49:58.105 [Thread-55] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:49:59.093 [Thread-56] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
00:49:59.099 [Thread-56] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:49:59.104 [Thread-56] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:49:59.108 [Thread-56] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:49:59.112 [Thread-56] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:50:03.125 [Thread-57] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:50:03.130 [Thread-57] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:50:03.140 [Thread-57] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:50:04.137 [Thread-58] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
00:50:04.143 [Thread-58] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:50:04.152 [Thread-58] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:50:04.158 [Thread-58] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:50:04.206 [Thread-58] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:50:05.143 [Thread-59] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:50:05.151 [Thread-59] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:50:05.159 [Thread-59] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:50:08.158 [Thread-60] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:50:08.164 [Thread-60] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:50:08.170 [Thread-60] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:50:09.387 [Thread-61] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received status update for taskId=task.journalnode.journalnode.NodeExecutor.1441673294675 state=TASK_LOST message='Abnormal executor termination' stagingTasks.size=1
00:50:09.387 [Thread-61] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Notifying observers
00:50:09.387 [Thread-61] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received task update for: task.journalnode.journalnode.NodeExecutor.1441673294675
00:50:09.389 [Thread-61] INFO  o.apache.mesos.hdfs.state.LiveState - Removing running task: value: "task.journalnode.journalnode.NodeExecutor.1441673294675"

00:50:09.443 [Thread-61] INFO  o.apache.mesos.hdfs.state.LiveState - Acquisition phase is already 'JOURNAL_NODES'
00:50:10.179 [Thread-62] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
00:50:10.189 [Thread-62] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - [192.168.33.13]
00:50:10.195 [Thread-62] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - [192.168.33.13]
00:50:10.195 [Thread-62] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Launching node of type journalnode with tasks [journalnode]
00:50:11.182 [Thread-63] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:50:11.197 [Thread-63] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:50:11.202 [Thread-63] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:50:13.205 [Thread-64] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:50:13.211 [Thread-64] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:50:13.218 [Thread-64] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:50:16.227 [Thread-65] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
00:50:16.232 [Thread-65] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:50:16.236 [Thread-65] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:50:16.242 [Thread-65] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:50:16.246 [Thread-65] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:50:17.231 [Thread-66] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:50:17.237 [Thread-66] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:50:17.243 [Thread-66] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:50:18.236 [Thread-67] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:50:18.240 [Thread-67] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:50:18.249 [Thread-67] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:50:22.266 [Thread-68] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
00:50:22.283 [Thread-68] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:50:22.286 [Thread-68] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:50:22.290 [Thread-68] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:50:22.293 [Thread-68] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:50:22.299 [Thread-68] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:50:22.306 [Thread-68] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:50:23.271 [Thread-69] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:50:23.277 [Thread-69] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:50:23.283 [Thread-69] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:50:28.308 [Thread-70] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
00:50:28.314 [Thread-70] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:50:28.320 [Thread-70] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:50:28.328 [Thread-70] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:50:28.335 [Thread-70] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:50:28.339 [Thread-70] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:50:28.347 [Thread-70] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:50:29.316 [Thread-71] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:50:29.324 [Thread-71] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:50:29.329 [Thread-71] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:50:34.357 [Thread-72] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 4 offers
00:50:34.369 [Thread-72] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:50:34.382 [Thread-72] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:50:34.389 [Thread-72] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:50:34.395 [Thread-72] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:50:34.405 [Thread-72] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:50:34.410 [Thread-72] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:50:34.414 [Thread-72] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:50:34.418 [Thread-72] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:50:40.392 [Thread-73] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
00:50:40.399 [Thread-73] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:50:40.403 [Thread-73] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:50:40.411 [Thread-73] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:50:40.431 [Thread-73] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:50:41.396 [Thread-74] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
00:50:41.401 [Thread-74] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:50:41.405 [Thread-74] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:50:41.411 [Thread-74] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:50:41.415 [Thread-74] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:50:46.427 [Thread-75] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
00:50:46.433 [Thread-75] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:50:46.444 [Thread-75] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:50:46.452 [Thread-75] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:50:46.456 [Thread-75] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:50:46.461 [Thread-75] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:50:46.464 [Thread-75] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:50:47.434 [Thread-76] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:50:47.442 [Thread-76] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:50:47.458 [Thread-76] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:50:52.479 [Thread-77] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 4 offers
00:50:52.485 [Thread-77] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:50:52.490 [Thread-77] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:50:52.494 [Thread-77] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:50:52.499 [Thread-77] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:50:52.504 [Thread-77] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:50:52.508 [Thread-77] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:50:52.512 [Thread-77] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:50:52.516 [Thread-77] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:50:58.522 [Thread-78] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
00:50:58.528 [Thread-78] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:50:58.536 [Thread-78] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:50:58.541 [Thread-78] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:50:58.546 [Thread-78] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:50:59.524 [Thread-79] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
00:50:59.530 [Thread-79] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:50:59.533 [Thread-79] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:50:59.539 [Thread-79] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:50:59.542 [Thread-79] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:51:04.554 [Thread-80] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
00:51:04.558 [Thread-80] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:51:04.562 [Thread-80] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:51:04.564 [Thread-80] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:51:04.567 [Thread-80] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:51:04.570 [Thread-80] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:51:04.577 [Thread-80] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:51:05.561 [Thread-81] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:51:05.565 [Thread-81] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:51:05.569 [Thread-81] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:51:10.604 [Thread-82] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
00:51:10.612 [Thread-82] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:51:10.618 [Thread-82] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:51:10.628 [Thread-82] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:51:10.635 [Thread-82] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:51:10.643 [Thread-82] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:51:10.645 [Thread-82] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:51:11.603 [Thread-83] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:51:11.606 [Thread-83] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:51:11.609 [Thread-83] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:51:16.640 [Thread-84] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
00:51:16.647 [Thread-84] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:51:16.656 [Thread-84] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:51:16.665 [Thread-84] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:51:16.670 [Thread-84] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:51:16.677 [Thread-84] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:51:16.682 [Thread-84] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:51:17.650 [Thread-85] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:51:17.654 [Thread-85] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:51:17.659 [Thread-85] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:51:22.685 [Thread-86] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
00:51:22.689 [Thread-86] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:51:22.697 [Thread-86] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:51:22.706 [Thread-86] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:51:22.719 [Thread-86] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:51:22.728 [Thread-86] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:51:22.744 [Thread-86] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:51:23.694 [Thread-87] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:51:23.702 [Thread-87] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:51:23.716 [Thread-87] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:51:28.745 [Thread-88] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
00:51:28.753 [Thread-88] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:51:28.758 [Thread-88] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:51:28.765 [Thread-88] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:51:28.769 [Thread-88] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:51:28.781 [Thread-88] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:51:28.793 [Thread-88] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:51:29.757 [Thread-89] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:51:29.767 [Thread-89] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:51:29.773 [Thread-89] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:51:34.797 [Thread-90] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
00:51:34.802 [Thread-90] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:51:34.806 [Thread-90] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:51:34.812 [Thread-90] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:51:34.817 [Thread-90] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:51:34.823 [Thread-90] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:51:34.828 [Thread-90] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:51:35.805 [Thread-91] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:51:35.829 [Thread-91] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:51:35.836 [Thread-91] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:51:40.846 [Thread-92] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
00:51:40.879 [Thread-92] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:51:40.882 [Thread-92] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:51:40.892 [Thread-92] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:51:40.900 [Thread-92] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:51:40.922 [Thread-92] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:51:40.935 [Thread-92] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:51:41.852 [Thread-93] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:51:41.859 [Thread-93] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:51:41.865 [Thread-93] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:51:46.893 [Thread-94] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
00:51:46.898 [Thread-94] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:51:46.905 [Thread-94] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:51:46.912 [Thread-94] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:51:46.923 [Thread-94] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:51:46.928 [Thread-94] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:51:46.945 [Thread-94] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:51:47.896 [Thread-95] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:51:47.911 [Thread-95] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:51:47.945 [Thread-95] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:51:52.443 [Thread-96] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received status update for taskId=task.journalnode.journalnode.NodeExecutor.1441673410196 state=TASK_LOST message='Abnormal executor termination' stagingTasks.size=1
00:51:52.444 [Thread-96] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Notifying observers
00:51:52.444 [Thread-96] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received task update for: task.journalnode.journalnode.NodeExecutor.1441673410196
00:51:52.444 [Thread-96] INFO  o.apache.mesos.hdfs.state.LiveState - Removing running task: value: "task.journalnode.journalnode.NodeExecutor.1441673410196"

00:51:52.501 [Thread-96] INFO  o.apache.mesos.hdfs.state.LiveState - Acquisition phase is already 'JOURNAL_NODES'
00:51:52.933 [Thread-97] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
00:51:52.943 [Thread-97] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - [192.168.33.13]
00:51:52.947 [Thread-97] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - [192.168.33.13]
00:51:52.947 [Thread-97] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Launching node of type journalnode with tasks [journalnode]
00:51:53.933 [Thread-98] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
00:51:53.946 [Thread-98] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:51:53.953 [Thread-98] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:51:53.998 [Thread-98] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:51:54.007 [Thread-98] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:51:58.980 [Thread-99] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
00:51:59.007 [Thread-99] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:51:59.014 [Thread-99] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:51:59.017 [Thread-99] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:51:59.020 [Thread-99] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:51:59.981 [Thread-100] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
00:51:59.986 [Thread-100] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:51:59.992 [Thread-100] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:52:00.010 [Thread-100] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:52:00.037 [Thread-100] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:52:05.020 [Thread-101] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:52:05.041 [Thread-101] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:52:05.061 [Thread-101] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:52:06.037 [Thread-102] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
00:52:06.043 [Thread-102] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:52:06.050 [Thread-102] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:52:06.056 [Thread-102] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:52:06.059 [Thread-102] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:52:06.066 [Thread-102] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:52:06.075 [Thread-102] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:52:11.067 [Thread-103] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:52:11.076 [Thread-103] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:52:11.088 [Thread-103] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:52:12.072 [Thread-104] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
00:52:12.111 [Thread-104] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:52:12.137 [Thread-104] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:52:12.147 [Thread-104] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:52:12.161 [Thread-104] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:52:13.081 [Thread-105] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:52:13.091 [Thread-105] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:52:13.111 [Thread-105] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:52:16.309 [Thread-106] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:52:16.318 [Thread-106] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:52:16.329 [Thread-106] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:52:18.199 [Thread-107] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
00:52:18.209 [Thread-107] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:52:18.306 [Thread-107] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:52:18.313 [Thread-107] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:52:18.320 [Thread-107] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:52:18.328 [Thread-107] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:52:18.348 [Thread-107] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
00:52:21.427 [Thread-108] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
00:52:21.438 [Thread-108] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
00:52:21.603 [Thread-108] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
Shutting down
Sending SIGTERM to process tree at pid 1397
Killing the following process trees:
[ 
-+- 1397 sh -c cd hdfs-mesos-0.1.3 && ./bin/hdfs-mesos 
 \--- 1398 /opt/test/packages/java/bin/java -cp lib/hdfs-scheduler-0.1.3-uber.jar -Dmesos.conf.path=etc/hadoop/mesos-site.xml -Dmesos.hdfs.config.server.port=31541 org.apache.mesos.hdfs.scheduler.Main 
]
Command terminated with signal Terminated (pid: 1397)

Inspecting the journal of the slave node where the journal node is launched didn't yield anything interesting:

Sep 08 00:48:14 core-04 mesos-slave[1271]: I0908 00:48:14.966559  1275 slave.cpp:1244] Got assigned task task.journalnode.journalnode.NodeExecutor.1441673294675 for framework 20150908-001609-169978048-5050-1255-0001
Sep 08 00:48:14 core-04 mesos-slave[1271]: I0908 00:48:14.970618  1275 slave.cpp:1355] Launching task task.journalnode.journalnode.NodeExecutor.1441673294675 for framework 20150908-001609-169978048-5050-1255-0001
Sep 08 00:48:14 core-04 mesos-slave[1271]: I0908 00:48:14.982698  1275 slave.cpp:4733] Launching executor executor.journalnode.NodeExecutor.1441673294675 of framework 20150908-001609-169978048-5050-1255-0001 with resources cpus(*):0.5; mem(*):256 in work directory '/tmp/mesos/slaves/20150908-001609-169978048-5050-1255-S3/frameworks/20150908-001609-169978048-5050-1255-0001/executors/executor.journalnode.NodeExecutor.1441673294675/runs/126df935-512d-4e4f-b03b-7203f8ff6f75'
Sep 08 00:48:14 core-04 mesos-slave[1271]: I0908 00:48:14.984588  1275 slave.cpp:1573] Queuing task 'task.journalnode.journalnode.NodeExecutor.1441673294675' for executor executor.journalnode.NodeExecutor.1441673294675 of framework '20150908-001609-169978048-5050-1255-0001
Sep 08 00:48:14 core-04 mesos-slave[1271]: I0908 00:48:14.986526  1280 docker.cpp:739] No container info found, skipping launch
Sep 08 00:48:14 core-04 mesos-slave[1271]: I0908 00:48:14.987340  1275 containerizer.cpp:534] Starting container '126df935-512d-4e4f-b03b-7203f8ff6f75' for executor 'executor.journalnode.NodeExecutor.1441673294675' of framework '20150908-001609-169978048-5050-1255-0001'
Sep 08 00:48:14 core-04 mesos-slave[1271]: I0908 00:48:14.993027  1275 launcher.cpp:131] Forked child with pid '1331' for container '126df935-512d-4e4f-b03b-7203f8ff6f75'
Sep 08 00:48:14 core-04 mesos-slave[1271]: I0908 00:48:14.993971  1275 containerizer.cpp:770] Checkpointing executor's forked pid 1331 to '/tmp/mesos/meta/slaves/20150908-001609-169978048-5050-1255-S3/frameworks/20150908-001609-169978048-5050-1255-0001/executors/executor.journalnode.NodeExecutor.1441673294675/runs/126df935-512d-4e4f-b03b-7203f8ff6f75/pids/forked.pid'
Sep 08 00:48:15 core-04 mesos-slave[1271]: I0908 00:48:15.094144  1277 containerizer.cpp:1188] Executor for container '126df935-512d-4e4f-b03b-7203f8ff6f75' has exited
Sep 08 00:48:15 core-04 mesos-slave[1271]: I0908 00:48:15.094629  1277 containerizer.cpp:1001] Destroying container '126df935-512d-4e4f-b03b-7203f8ff6f75'
Sep 08 00:48:22 core-04 mesos-slave[1271]: I0908 00:48:22.160383  1273 slave.cpp:4179] Querying resource estimator for oversubscribable resources
Sep 08 00:48:22 core-04 mesos-slave[1271]: I0908 00:48:22.160852  1273 slave.cpp:4193] Received oversubscribable resources  from the resource estimator
Sep 08 00:48:37 core-04 mesos-slave[1271]: I0908 00:48:37.008306  1280 slave.cpp:3842] Current disk usage 33.05%. Max allowed age: 3.986718477556910days
Sep 08 00:48:37 core-04 mesos-slave[1271]: I0908 00:48:37.163462  1277 slave.cpp:4179] Querying resource estimator for oversubscribable resources
Sep 08 00:48:37 core-04 mesos-slave[1271]: I0908 00:48:37.164271  1280 slave.cpp:4193] Received oversubscribable resources  from the resource estimator
Sep 08 00:48:52 core-04 mesos-slave[1271]: I0908 00:48:52.165416  1273 slave.cpp:4179] Querying resource estimator for oversubscribable resources
Sep 08 00:48:52 core-04 mesos-slave[1271]: I0908 00:48:52.166563  1273 slave.cpp:4193] Received oversubscribable resources  from the resource estimator
Sep 08 00:49:07 core-04 mesos-slave[1271]: I0908 00:49:07.167877  1280 slave.cpp:4179] Querying resource estimator for oversubscribable resources
Sep 08 00:49:07 core-04 mesos-slave[1271]: I0908 00:49:07.173120  1274 slave.cpp:4193] Received oversubscribable resources  from the resource estimator
Sep 08 00:49:22 core-04 mesos-slave[1271]: I0908 00:49:22.176722  1279 slave.cpp:4179] Querying resource estimator for oversubscribable resources
Sep 08 00:49:22 core-04 mesos-slave[1271]: I0908 00:49:22.177196  1279 slave.cpp:4193] Received oversubscribable resources  from the resoesources
Sep 08 00:49:22 core-04 mesos-slave[1271]: I0908 00:49:22.177196  1279 slave.cpp:4193] Received oversubscribable resources  from the resource estimator
Sep 08 00:49:25 core-04 mesos-slave[1271]: I0908 00:49:25.053087  1274 http.cpp:174] HTTP GET for /slave(1)/state.json from 192.168.33.1:51923 with User-Agent='Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36'
Sep 08 00:49:29 core-04 mesos-slave[1271]: I0908 00:49:29.423539  1276 http.cpp:174] HTTP GET for /slave(1)/state.json from 192.168.33.1:51923 with User-Agent='Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36'
Sep 08 00:49:37 core-04 mesos-slave[1271]: I0908 00:49:37.009587  1274 slave.cpp:3842] Current disk usage 35.35%. Max allowed age: 3.825257808997153days
Sep 08 00:49:37 core-04 mesos-slave[1271]: I0908 00:49:37.199298  1277 slave.cpp:4179] Querying resource estimator for oversubscribable resources
Sep 08 00:49:37 core-04 mesos-slave[1271]: I0908 00:49:37.199853  1277 slave.cpp:4193] Received oversubscribable resources  from the resource estimator
Sep 08 00:49:52 core-04 mesos-slave[1271]: I0908 00:49:52.202038  1277 slave.cpp:4179] Querying resource estimator for oversubscribable resources
Sep 08 00:49:52 core-04 mesos-slave[1271]: I0908 00:49:52.202692  1277 slave.cpp:4193] Received oversubscribable resources  from the resource estimator
Sep 08 00:50:00 core-04 mesos-slave[1271]: I0908 00:50:00.183461  1279 http.cpp:174] HTTP GET for /slave(1)/state.json from 192.168.33.1:51923 with User-Agent='Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36'
Sep 08 00:50:07 core-04 mesos-slave[1271]: I0908 00:50:07.203867  1273 slave.cpp:4179] Querying resource estimator for oversubscribable resources
Sep 08 00:50:09 core-04 mesos-slave[1271]: E0908 00:50:09.389519  1278 slave.cpp:3258] Container '126df935-512d-4e4f-b03b-7203f8ff6f75' for executor 'executor.journalnode.NodeExecutor.1441673294675' of framework '20150908-001609-169978048-5050-1255-0001' failed to start: Container destroyed during launch
Sep 08 00:50:09 core-04 mesos-slave[1271]: E0908 00:50:09.389919  1278 slave.cpp:3340] Termination of executor 'executor.journalnode.NodeExecutor.1441673294675' of framework '20150908-001609-169978048-5050-1255-0001' failed: Unknown container: 126df935-512d-4e4f-b03b-7203f8ff6f75
Sep 08 00:50:09 core-04 mesos-slave[1271]: W0908 00:50:09.389967  1277 containerizer.cpp:990] Ignoring destroy of unknown container: 126df935-512d-4e4f-b03b-7203f8ff6f75
Sep 08 00:50:09 core-04 mesos-slave[1271]: I0908 00:50:09.392066  1278 slave.cpp:2671] Handling status update TASK_LOST (UUID: c2905e85-cbe6-4288-80aa-a90499061984) for task task.journalnode.journalnode.NodeExecutor.1441673294675 of framework 20150908-001609-169978048-5050-1255-0001 from @0.0.0.0:0
Sep 08 00:50:09 core-04 mesos-slave[1271]: W0908 00:50:09.392880  1277 containerizer.cpp:885] Ignoring update for unknown container: 126df935-512d-4e4f-b03b-7203f8ff6f75
Sep 08 00:50:09 core-04 mesos-slave[1271]: I0908 00:50:09.393369  1279 status_update_manager.cpp:322] Received status update TASK_LOST (UUID: c2905e85-cbe6-4288-80aa-a90499061984) for task task.journalnode.journalnode.NodeExecutor.1441673294675 of framework 20150908-001609-169978048-5050-1255-0001
Sep 08 00:50:09 core-04 mesos-slave[1271]: I0908 00:50:09.394023  1279 status_update_manager.cpp:826] Checkpointing UPDATE for status update TASK_LOST (UUID: c2905e85-cbe6-4288-80aa-a90499061984) for task task.journalnode.journalnode.NodeExecutor.1441673294675 of framework 20150908-001609-169978048-5050-1255-0001
Sep 08 00:50:09 core-04 mesos-slave[1271]: I0908 00:50:09.394565  1278 slave.cpp:2926] Forwarding the update TASK_LOST (UUID: c2905e85-cbe6-4288-80aa-a90499061984) for task task.journalnode.journalnode.NodeExecutor.1441673294675 of framework 20150908-001609-169978048-5050-1255-0001 to master@192.168.33.10:5050
Sep 08 00:50:09 core-04 mesos-slave[1271]: I0908 00:50:09.470275  1273 status_update_manager.cpp:394] Received status update acknowledgement (UUID: c2905e85-cbe6-4288-80aa-a90499061984) for task task.journalnode.journalnode.NodeExecutor.1441673294675 of framework 20150908-001609-169978048-5050-1255-0001
Sep 08 00:50:09 core-04 mesos-slave[1271]: I0908 00:50:09.470680  1273 status_update_manager.cpp:826] Checkpointing ACK for status update TASK_LOST (UUID: c2905e85-cbe6-4288-80aa-a90499061984) for task task.journalnode.journalnode.NodeExecutor.1441673294675 of framework 20150908-001609-169978048-5050-1255-0001
Sep 08 00:50:09 core-04 mesos-slave[1271]: I0908 00:50:09.472021  1273 slave.cpp:3460] Cleaning up executor 'executor.journalnode.NodeExecutor.1441673294675' of framework 20150908-001609-169978048-5050-1255-0001
elingg commented 9 years ago

hi @F21, a couple of things here:

F21 commented 9 years ago

Hey @elingg

I0909 00:17:01.210819  1271 logging.cpp:172] INFO level logging started!
I0909 00:17:01.211256  1271 fetcher.cpp:409] Fetcher Info: {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/20150908-235003-169978048-5050-572-S2\/root","items":[{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"http:\/\/192.168.33.13:31795\/hdfs-mesos-executor-0.1.3.tgz"}},{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"http:\/\/192.168.33.13:31795\/hdfs-site.xml"}},{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"https:\/\/downloads.mesosphere.io\/java\/jre-7u76-linux-x64.tar.gz"}}],"sandbox_directory":"\/tmp\/mesos\/slaves\/20150908-235003-169978048-5050-572-S2\/frameworks\/20150908-235003-169978048-5050-572-0000\/executors\/executor.journalnode.NodeExecutor.1441757820880\/runs\/7d7635ba-e2f5-44c1-8991-3410e35eb245","user":"root"}
I0909 00:17:01.220074  1271 fetcher.cpp:364] Fetching URI 'http://192.168.33.13:31795/hdfs-mesos-executor-0.1.3.tgz'
I0909 00:17:01.220160  1271 fetcher.cpp:238] Fetching directly into the sandbox directory
I0909 00:17:01.220212  1271 fetcher.cpp:176] Fetching URI 'http://192.168.33.13:31795/hdfs-mesos-executor-0.1.3.tgz'
I0909 00:17:01.220265  1271 fetcher.cpp:126] Downloading resource from 'http://192.168.33.13:31795/hdfs-mesos-executor-0.1.3.tgz' to '/tmp/mesos/slaves/20150908-235003-169978048-5050-572-S2/frameworks/20150908-235003-169978048-5050-572-0000/executors/executor.journalnode.NodeExecutor.1441757820880/runs/7d7635ba-e2f5-44c1-8991-3410e35eb245/hdfs-mesos-executor-0.1.3.tgz'
I0909 00:17:04.645876  1271 fetcher.cpp:76] Extracting with command: tar -C '/tmp/mesos/slaves/20150908-235003-169978048-5050-572-S2/frameworks/20150908-235003-169978048-5050-572-0000/executors/executor.journalnode.NodeExecutor.1441757820880/runs/7d7635ba-e2f5-44c1-8991-3410e35eb245' -xf '/tmp/mesos/slaves/20150908-235003-169978048-5050-572-S2/frameworks/20150908-235003-169978048-5050-572-0000/executors/executor.journalnode.NodeExecutor.1441757820880/runs/7d7635ba-e2f5-44c1-8991-3410e35eb245/hdfs-mesos-executor-0.1.3.tgz'
I0909 00:17:05.688472  1271 fetcher.cpp:84] Extracted '/tmp/mesos/slaves/20150908-235003-169978048-5050-572-S2/frameworks/20150908-235003-169978048-5050-572-0000/executors/executor.journalnode.NodeExecutor.1441757820880/runs/7d7635ba-e2f5-44c1-8991-3410e35eb245/hdfs-mesos-executor-0.1.3.tgz' into '/tmp/mesos/slaves/20150908-235003-169978048-5050-572-S2/frameworks/20150908-235003-169978048-5050-572-0000/executors/executor.journalnode.NodeExecutor.1441757820880/runs/7d7635ba-e2f5-44c1-8991-3410e35eb245'
W0909 00:17:05.688606  1271 fetcher.cpp:260] Copying instead of extracting resource from URI with 'extract' flag, because it does not seem to be an archive: http://192.168.33.13:31795/hdfs-mesos-executor-0.1.3.tgz
I0909 00:17:05.688980  1271 fetcher.cpp:441] Fetched 'http://192.168.33.13:31795/hdfs-mesos-executor-0.1.3.tgz' to '/tmp/mesos/slaves/20150908-235003-169978048-5050-572-S2/frameworks/20150908-235003-169978048-5050-572-0000/executors/executor.journalnode.NodeExecutor.1441757820880/runs/7d7635ba-e2f5-44c1-8991-3410e35eb245/hdfs-mesos-executor-0.1.3.tgz'
I0909 00:17:05.689014  1271 fetcher.cpp:364] Fetching URI 'http://192.168.33.13:31795/hdfs-site.xml'
I0909 00:17:05.689040  1271 fetcher.cpp:238] Fetching directly into the sandbox directory
I0909 00:17:05.689079  1271 fetcher.cpp:176] Fetching URI 'http://192.168.33.13:31795/hdfs-site.xml'
I0909 00:17:05.689121  1271 fetcher.cpp:126] Downloading resource from 'http://192.168.33.13:31795/hdfs-site.xml' to '/tmp/mesos/slaves/20150908-235003-169978048-5050-572-S2/frameworks/20150908-235003-169978048-5050-572-0000/executors/executor.journalnode.NodeExecutor.1441757820880/runs/7d7635ba-e2f5-44c1-8991-3410e35eb245/hdfs-site.xml'
W0909 00:17:05.717823  1271 fetcher.cpp:260] Copying instead of extracting resource from URI with 'extract' flag, because it does not seem to be an archive: http://192.168.33.13:31795/hdfs-site.xml
I0909 00:17:05.718186  1271 fetcher.cpp:441] Fetched 'http://192.168.33.13:31795/hdfs-site.xml' to '/tmp/mesos/slaves/20150908-235003-169978048-5050-572-S2/frameworks/20150908-235003-169978048-5050-572-0000/executors/executor.journalnode.NodeExecutor.1441757820880/runs/7d7635ba-e2f5-44c1-8991-3410e35eb245/hdfs-site.xml'
I0909 00:17:05.718291  1271 fetcher.cpp:364] Fetching URI 'https://downloads.mesosphere.io/java/jre-7u76-linux-x64.tar.gz'
I0909 00:17:05.718391  1271 fetcher.cpp:238] Fetching directly into the sandbox directory
I0909 00:17:05.718444  1271 fetcher.cpp:176] Fetching URI 'https://downloads.mesosphere.io/java/jre-7u76-linux-x64.tar.gz'
I0909 00:17:05.718490  1271 fetcher.cpp:126] Downloading resource from 'https://downloads.mesosphere.io/java/jre-7u76-linux-x64.tar.gz' to '/tmp/mesos/slaves/20150908-235003-169978048-5050-572-S2/frameworks/20150908-235003-169978048-5050-572-0000/executors/executor.journalnode.NodeExecutor.1441757820880/runs/7d7635ba-e2f5-44c1-8991-3410e35eb245/jre-7u76-linux-x64.tar.gz'
I0909 00:18:46.927767  1271 fetcher.cpp:76] Extracting with command: tar -C '/tmp/mesos/slaves/20150908-235003-169978048-5050-572-S2/frameworks/20150908-235003-169978048-5050-572-0000/executors/executor.journalnode.NodeExecutor.1441757820880/runs/7d7635ba-e2f5-44c1-8991-3410e35eb245' -xf '/tmp/mesos/slaves/20150908-235003-169978048-5050-572-S2/frameworks/20150908-235003-169978048-5050-572-0000/executors/executor.journalnode.NodeExecutor.1441757820880/runs/7d7635ba-e2f5-44c1-8991-3410e35eb245/jre-7u76-linux-x64.tar.gz'
I0909 00:18:48.727993  1271 fetcher.cpp:84] Extracted '/tmp/mesos/slaves/20150908-235003-169978048-5050-572-S2/frameworks/20150908-235003-169978048-5050-572-0000/executors/executor.journalnode.NodeExecutor.1441757820880/runs/7d7635ba-e2f5-44c1-8991-3410e35eb245/jre-7u76-linux-x64.tar.gz' into '/tmp/mesos/slaves/20150908-235003-169978048-5050-572-S2/frameworks/20150908-235003-169978048-5050-572-0000/executors/executor.journalnode.NodeExecutor.1441757820880/runs/7d7635ba-e2f5-44c1-8991-3410e35eb245'
W0909 00:18:48.728261  1271 fetcher.cpp:260] Copying instead of extracting resource from URI with 'extract' flag, because it does not seem to be an archive: https://downloads.mesosphere.io/java/jre-7u76-linux-x64.tar.gz
I0909 00:18:48.728324  1271 fetcher.cpp:441] Fetched 'https://downloads.mesosphere.io/java/jre-7u76-linux-x64.tar.gz' to '/tmp/mesos/slaves/20150908-235003-169978048-5050-572-S2/frameworks/20150908-235003-169978048-5050-572-0000/executors/executor.journalnode.NodeExecutor.1441757820880/runs/7d7635ba-e2f5-44c1-8991-3410e35eb245/jre-7u76-linux-x64.tar.gz'

The stderr is empty.

elingg commented 9 years ago

hm, taking a second look, it seems you have the oversubscription module running and it is affecting your containerizer. I've seen these issues with Mesos modules. As you can see below it's giving the unknown container error and abnormal executor error. You should see this error launching various types of tasks including with other frameworks.

I see this in the scheduler log: state=TASK_LOST message='Abnormal executor termination

I see this in the agent log: Querying resource estimator for oversubscribable resources Sep 08 00:50:09 core-04 mesos-slave[1271]: E0908 00:50:09.389519 1278 slave.cpp:3258] Container '126df935-512d-4e4f-b03b-7203f8ff6f75' for executor 'executor.journalnode.NodeExecutor.1441673294675' of framework '20150908-001609-169978048-5050-1255-0001' failed to start: Container destroyed during launch Sep 08 00:50:09 core-04 mesos-slave[1271]: E0908 00:50:09.389919 1278 slave.cpp:3340] Termination of executor 'executor.journalnode.NodeExecutor.1441673294675' of framework '20150908-001609-169978048-5050-1255-0001' failed: Unknown container: 126df935-512d-4e4f-b03b-7203f8ff6f75

F21 commented 9 years ago

I couldn't find a way to disable the oversubscription module, so I built binaries for mesos 0.22.1. I think oversubscription was a new feature in 0.23.

However, once that's done, if I use marathon, the scheduler kept getting killed because it finished downloading the mesos-hdfs framework.

I ended up manually copying mteos-hdfs framework to a node an running it.

Having done that, I am still see JN1 getting killed.

This is the log from the scheduler:

2015-09-10 04:28:41,658:3965(0x7f2f14f72700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5
2015-09-10 04:28:41,658:3965(0x7f2f14f72700):ZOO_INFO@log_env@716: Client environment:host.name=core-02
2015-09-10 04:28:41,658:3965(0x7f2f14f72700):ZOO_INFO@log_env@723: Client environment:os.name=Linux
2015-09-10 04:28:41,658:3965(0x7f2f14f72700):ZOO_INFO@log_env@724: Client environment:os.arch=4.1.6-coreos-r1
2015-09-10 04:28:41,658:3965(0x7f2f14f72700):ZOO_INFO@log_env@725: Client environment:os.version=#2 SMP Thu Sep 3 01:30:12 UTC 2015
2015-09-10 04:28:41,661:3965(0x7f2f14f72700):ZOO_INFO@log_env@733: Client environment:user.name=core
2015-09-10 04:28:41,661:3965(0x7f2f14f72700):ZOO_INFO@log_env@741: Client environment:user.home=/home/core
2015-09-10 04:28:41,661:3965(0x7f2f14f72700):ZOO_INFO@log_env@753: Client environment:user.dir=/tmp/hdfs-mesos-0.1.3
2015-09-10 04:28:41,661:3965(0x7f2f14f72700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=master.mesos:2181 sessionTimeout=20000 watcher=0x7f2f0666ff50 sessionId=0 sessionPasswd=<null> context=0x7f2ef4001910 flags=0
2015-09-10 04:28:41,669:3965(0x7f2ef3fff700):ZOO_INFO@check_events@1703: initiated connection to server [192.168.33.10:2181]
2015-09-10 04:28:41,709:3965(0x7f2ef3fff700):ZOO_INFO@check_events@1750: session establishment complete on server [192.168.33.10:2181], sessionId=0x14fb5018f750003, negotiated timeout=20000
04:28:41.801 [HdfsScheduler] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Registering without authentication
2015-09-10 04:28:41,817:3965(0x7f2f15f74700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5
2015-09-10 04:28:41,817:3965(0x7f2f15f74700):ZOO_INFO@log_env@716: Client environment:host.name=core-02
04:28:41.817 [main] INFO  org.eclipse.jetty.util.log - Logging initialized @1081ms
2015-09-10 04:28:41,818:3965(0x7f2f15f74700):ZOO_INFO@log_env@723: Client environment:os.name=Linux
2015-09-10 04:28:41,818:3965(0x7f2f15f74700):ZOO_INFO@log_env@724: Client environment:os.arch=4.1.6-coreos-r1
2015-09-10 04:28:41,818:3965(0x7f2f15f74700):ZOO_INFO@log_env@725: Client environment:os.version=#2 SMP Thu Sep 3 01:30:12 UTC 2015
I0910 04:28:41.818083  3992 sched.cpp:157] Version: 0.22.1
2015-09-10 04:28:41,827:3965(0x7f2f15f74700):ZOO_INFO@log_env@733: Client environment:user.name=core
2015-09-10 04:28:41,827:3965(0x7f2f15f74700):ZOO_INFO@log_env@741: Client environment:user.home=/home/core
2015-09-10 04:28:41,827:3965(0x7f2f15f74700):ZOO_INFO@log_env@753: Client environment:user.dir=/tmp/hdfs-mesos-0.1.3
2015-09-10 04:28:41,828:3965(0x7f2f15f74700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=master.mesos:2181 sessionTimeout=10000 watcher=0x7f2f0666ff50 sessionId=0 sessionPasswd=<null> context=0x7f2eec002550 flags=0
2015-09-10 04:28:41,837:3965(0x7f2ef2ffd700):ZOO_INFO@check_events@1703: initiated connection to server [192.168.33.10:2181]
2015-09-10 04:28:41,843:3965(0x7f2ef2ffd700):ZOO_INFO@check_events@1750: session establishment complete on server [192.168.33.10:2181], sessionId=0x14fb5018f750004, negotiated timeout=10000
I0910 04:28:41.844611  3983 group.cpp:313] Group process (group(1)@192.168.33.11:60961) connected to ZooKeeper
I0910 04:28:41.844859  3983 group.cpp:790] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
I0910 04:28:41.845044  3983 group.cpp:385] Trying to create path '/mesos' in ZooKeeper
I0910 04:28:41.871109  3983 detector.cpp:138] Detected a new leader: (id='1')
I0910 04:28:41.872189  3983 group.cpp:659] Trying to get '/mesos/info_0000000001' in ZooKeeper
I0910 04:28:41.879436  3983 detector.cpp:452] A new leading master (UPID=master@192.168.33.10:5050) is detected
I0910 04:28:41.880085  3983 sched.cpp:254] New master detected at master@192.168.33.10:5050
04:28:41.880 [main] INFO  org.eclipse.jetty.server.Server - jetty-9.2.z-SNAPSHOT
I0910 04:28:41.881660  3983 sched.cpp:264] No credentials provided. Attempting to register without authentication
I0910 04:28:41.896057  3986 sched.cpp:448] Framework registered with 20150910-020501-169978048-5050-1230-0001
04:28:41.937 [main] INFO  o.e.jetty.server.ServerConnector - Started ServerConnector@696dc8c7{HTTP/1.1}{0.0.0.0:8765}
04:28:41.937 [Thread-3] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Registered framework frameworkId=20150910-020501-169978048-5050-1230-0001
04:28:41.938 [main] INFO  org.eclipse.jetty.server.Server - Started @1209ms
04:28:41.939 [Thread-3] INFO  o.apache.mesos.hdfs.state.LiveState - Acquisition phase is already 'RECONCILING_TASKS'
04:28:42.046 [Thread-13] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
04:28:42.058 [Thread-13] INFO  o.apache.mesos.hdfs.state.LiveState - Transitioning from acquisition phase 'RECONCILING_TASKS' to 'JOURNAL_NODES'
04:28:42.066 [Thread-13] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:28:42.086 [Thread-13] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Launching node of type journalnode with tasks [journalnode]
04:28:47.842 [Thread-14] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
04:28:47.857 [Thread-14] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:28:47.861 [Thread-14] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:28:47.866 [Thread-14] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:28:47.869 [Thread-14] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:28:47.873 [Thread-14] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:28:47.877 [Thread-14] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
2015-09-10 04:28:51,891:3965(0x7f2ef2ffd700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 11ms
04:28:53.870 [Thread-15] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
04:28:53.877 [Thread-15] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:28:53.882 [Thread-15] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:28:53.890 [Thread-15] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:28:53.924 [Thread-15] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:28:53.933 [Thread-15] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:28:53.938 [Thread-15] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:28:59.894 [Thread-16] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
04:28:59.901 [Thread-16] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:28:59.908 [Thread-16] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:28:59.914 [Thread-16] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:28:59.919 [Thread-16] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:28:59.930 [Thread-16] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:28:59.935 [Thread-16] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:29:05.909 [Thread-17] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
04:29:05.943 [Thread-17] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:29:05.952 [Thread-17] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:29:05.957 [Thread-17] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:29:05.966 [Thread-17] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:29:05.984 [Thread-17] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:29:05.989 [Thread-17] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
2015-09-10 04:29:11,929:3965(0x7f2ef2ffd700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 18ms
04:29:11.932 [Thread-18] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
04:29:11.984 [Thread-18] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:29:12.002 [Thread-18] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:29:12.015 [Thread-18] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:29:12.038 [Thread-18] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:29:12.087 [Thread-18] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:29:12.283 [Thread-18] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:29:17.952 [Thread-19] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
04:29:17.957 [Thread-19] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:29:17.962 [Thread-19] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:29:17.965 [Thread-19] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:29:17.969 [Thread-19] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:29:18.014 [Thread-19] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:29:18.020 [Thread-19] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:29:24.001 [Thread-20] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:29:24.012 [Thread-20] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:29:24.079 [Thread-20] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:29:25.008 [Thread-21] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
04:29:25.049 [Thread-21] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:29:25.072 [Thread-21] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:29:25.090 [Thread-21] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:29:25.153 [Thread-21] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
2015-09-10 04:29:25,302:3965(0x7f2ef2ffd700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 23ms
04:29:31.034 [Thread-22] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
04:29:31.075 [Thread-22] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:29:31.081 [Thread-22] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:29:31.086 [Thread-22] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:29:31.108 [Thread-22] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:29:31.126 [Thread-22] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:29:31.149 [Thread-22] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
2015-09-10 04:29:31,992:3965(0x7f2ef2ffd700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 18ms
2015-09-10 04:29:35,345:3965(0x7f2ef2ffd700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 19ms
04:29:37.048 [Thread-23] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
04:29:37.090 [Thread-23] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:29:37.095 [Thread-23] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:29:37.513 [Thread-23] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:29:37.520 [Thread-23] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:29:37.524 [Thread-23] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:29:37.529 [Thread-23] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:29:43.070 [Thread-24] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
04:29:43.082 [Thread-24] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:29:43.110 [Thread-24] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:29:43.170 [Thread-24] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:29:43.212 [Thread-24] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:29:43.263 [Thread-24] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:29:43.308 [Thread-24] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:29:49.096 [Thread-25] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
04:29:49.193 [Thread-25] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:29:49.199 [Thread-25] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:29:49.387 [Thread-25] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:29:49.402 [Thread-25] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:29:49.420 [Thread-25] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:29:49.440 [Thread-25] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:29:55.112 [Thread-26] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
04:29:55.136 [Thread-26] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:29:55.142 [Thread-26] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:29:55.156 [Thread-26] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:29:55.161 [Thread-26] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:29:55.169 [Thread-26] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:29:55.182 [Thread-26] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:30:01.139 [Thread-27] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
04:30:01.158 [Thread-27] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:30:01.175 [Thread-27] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:30:01.179 [Thread-27] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:30:01.189 [Thread-27] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:30:01.193 [Thread-27] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:30:01.199 [Thread-27] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:30:07.160 [Thread-28] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
04:30:07.207 [Thread-28] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:30:07.212 [Thread-28] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:30:07.220 [Thread-28] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:30:07.228 [Thread-28] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:30:07.232 [Thread-28] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:30:07.239 [Thread-28] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:30:13.183 [Thread-29] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
04:30:13.209 [Thread-29] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:30:13.216 [Thread-29] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:30:13.220 [Thread-29] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:30:13.227 [Thread-29] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:30:14.188 [Thread-30] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:30:14.195 [Thread-30] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:30:14.199 [Thread-30] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:30:19.205 [Thread-31] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
04:30:19.211 [Thread-31] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:30:19.216 [Thread-31] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:30:19.222 [Thread-31] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:30:19.226 [Thread-31] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:30:20.228 [Thread-32] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:30:20.232 [Thread-32] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:30:20.239 [Thread-32] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:30:25.238 [Thread-33] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
04:30:25.248 [Thread-33] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:30:25.252 [Thread-33] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:30:25.258 [Thread-33] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:30:25.261 [Thread-33] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:30:26.233 [Thread-34] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:30:26.239 [Thread-34] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:30:26.256 [Thread-34] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:30:31.258 [Thread-35] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
04:30:31.262 [Thread-35] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:30:31.268 [Thread-35] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:30:31.272 [Thread-35] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:30:31.276 [Thread-35] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:30:32.265 [Thread-36] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:30:32.279 [Thread-36] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:30:32.308 [Thread-36] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:30:37.278 [Thread-37] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
04:30:37.283 [Thread-37] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:30:37.301 [Thread-37] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:30:37.309 [Thread-37] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:30:37.313 [Thread-37] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:30:38.288 [Thread-38] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:30:38.307 [Thread-38] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:30:38.409 [Thread-38] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:30:43.297 [Thread-39] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
04:30:43.327 [Thread-39] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:30:43.347 [Thread-39] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:30:43.361 [Thread-39] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:30:43.398 [Thread-39] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:30:44.308 [Thread-40] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:30:44.316 [Thread-40] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:30:44.333 [Thread-40] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:30:46.106 [Thread-41] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received status update for taskId=task.journalnode.journalnode.NodeExecutor.1441859322086 state=TASK_LOST message='Abnormal executor termination' stagingTasks.size=1
04:30:46.106 [Thread-41] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Notifying observers
04:30:46.106 [Thread-41] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received task update for: task.journalnode.journalnode.NodeExecutor.1441859322086
04:30:46.108 [Thread-41] INFO  o.apache.mesos.hdfs.state.LiveState - Removing running task: value: "task.journalnode.journalnode.NodeExecutor.1441859322086"

04:30:46.177 [Thread-41] INFO  o.apache.mesos.hdfs.state.LiveState - Acquisition phase is already 'JOURNAL_NODES'
04:30:46.314 [Thread-42] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:30:46.327 [Thread-42] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - [192.168.33.10]
04:30:46.327 [Thread-42] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Launching node of type journalnode with tasks [journalnode]
04:30:49.324 [Thread-43] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:30:49.359 [Thread-43] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:30:49.372 [Thread-43] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:30:50.324 [Thread-44] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:30:50.350 [Thread-44] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:30:50.389 [Thread-44] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:30:52.329 [Thread-45] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:30:52.332 [Thread-45] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:30:52.340 [Thread-45] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:30:55.339 [Thread-46] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:30:55.345 [Thread-46] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:30:55.375 [Thread-46] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:30:56.340 [Thread-47] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:30:56.351 [Thread-47] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:30:56.374 [Thread-47] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:30:58.358 [Thread-48] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:30:58.369 [Thread-48] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:30:58.380 [Thread-48] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:31:01.357 [Thread-49] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:31:01.380 [Thread-49] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:31:01.398 [Thread-49] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:31:03.372 [Thread-50] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:31:03.377 [Thread-50] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:31:03.390 [Thread-50] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:31:04.372 [Thread-51] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:31:04.378 [Thread-51] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:31:04.391 [Thread-51] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:31:08.379 [Thread-52] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:31:08.390 [Thread-52] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:31:08.401 [Thread-52] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:31:09.381 [Thread-53] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:31:09.385 [Thread-53] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:31:09.388 [Thread-53] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:31:10.383 [Thread-54] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:31:10.508 [Thread-54] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:31:10.512 [Thread-54] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:31:14.396 [Thread-55] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:31:14.406 [Thread-55] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:31:14.410 [Thread-55] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:31:15.401 [Thread-56] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:31:15.459 [Thread-56] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:31:15.469 [Thread-56] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:31:17.411 [Thread-57] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:31:17.420 [Thread-57] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:31:17.431 [Thread-57] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:31:20.437 [Thread-58] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:31:20.444 [Thread-58] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:31:20.448 [Thread-58] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:31:21.435 [Thread-59] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:31:21.440 [Thread-59] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:31:21.445 [Thread-59] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:31:23.435 [Thread-60] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:31:23.440 [Thread-60] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:31:23.445 [Thread-60] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:31:26.485 [Thread-61] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:31:26.488 [Thread-61] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:31:26.495 [Thread-61] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:31:27.451 [Thread-62] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:31:27.455 [Thread-62] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:31:27.472 [Thread-62] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:31:28.460 [Thread-63] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:31:28.467 [Thread-63] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:31:28.475 [Thread-63] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
2015-09-10 04:31:32,236:3965(0x7f2ef2ffd700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 71ms
04:31:33.469 [Thread-64] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
04:31:33.478 [Thread-64] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:31:33.511 [Thread-64] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:31:33.519 [Thread-64] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:31:33.552 [Thread-64] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:31:34.479 [Thread-65] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:31:34.498 [Thread-65] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:31:34.504 [Thread-65] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
2015-09-10 04:31:38,965:3965(0x7f2ef2ffd700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 56ms
04:31:39.492 [Thread-66] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
04:31:39.508 [Thread-66] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:31:39.543 [Thread-66] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:31:39.563 [Thread-66] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:31:39.572 [Thread-66] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:31:40.497 [Thread-67] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:31:40.501 [Thread-67] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:31:40.505 [Thread-67] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:31:45.516 [Thread-68] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:31:45.523 [Thread-68] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:31:45.529 [Thread-68] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:31:46.521 [Thread-69] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
04:31:46.530 [Thread-69] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:31:46.540 [Thread-69] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:31:46.546 [Thread-69] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:31:46.551 [Thread-69] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:31:51.545 [Thread-70] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:31:51.553 [Thread-70] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:31:51.561 [Thread-70] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:31:52.539 [Thread-71] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:31:52.554 [Thread-71] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:31:52.576 [Thread-71] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:31:53.546 [Thread-72] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:31:53.585 [Thread-72] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:31:53.590 [Thread-72] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:31:58.571 [Thread-73] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
04:31:58.577 [Thread-73] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:31:58.588 [Thread-73] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:31:58.591 [Thread-73] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:31:58.606 [Thread-73] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:31:59.569 [Thread-74] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:31:59.594 [Thread-74] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:31:59.745 [Thread-74] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:32:04.593 [Thread-75] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
04:32:04.627 [Thread-75] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:32:04.677 [Thread-75] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:32:04.706 [Thread-75] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:32:04.719 [Thread-75] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:32:05.600 [Thread-76] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:32:05.610 [Thread-76] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:32:05.648 [Thread-76] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
2015-09-10 04:32:09,035:3965(0x7f2ef2ffd700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 26ms
04:32:10.628 [Thread-77] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:32:10.652 [Thread-77] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:32:10.691 [Thread-77] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:32:11.628 [Thread-78] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
04:32:11.642 [Thread-78] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:32:11.665 [Thread-78] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:32:11.702 [Thread-78] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:32:11.709 [Thread-78] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:32:16.637 [Thread-79] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:32:16.720 [Thread-79] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:32:16.765 [Thread-79] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:32:17.644 [Thread-80] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:32:17.759 [Thread-80] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:32:17.781 [Thread-80] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:32:18.644 [Thread-81] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:32:18.693 [Thread-81] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:32:18.762 [Thread-81] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
2015-09-10 04:32:19,080:3965(0x7f2ef2ffd700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 36ms
04:32:22.660 [Thread-82] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:32:22.682 [Thread-82] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:32:22.718 [Thread-82] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:32:23.664 [Thread-83] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:32:23.699 [Thread-83] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:32:23.712 [Thread-83] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:32:24.668 [Thread-84] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:32:24.679 [Thread-84] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:32:24.711 [Thread-84] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:32:28.679 [Thread-85] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:32:28.694 [Thread-85] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:32:28.744 [Thread-85] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:32:29.684 [Thread-86] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:32:29.827 [Thread-86] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:32:29.860 [Thread-86] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:32:30.701 [Thread-87] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:32:30.734 [Thread-87] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:32:30.783 [Thread-87] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:32:34.711 [Thread-88] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:32:34.779 [Thread-88] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:32:34.971 [Thread-88] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:32:35.716 [Thread-89] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:32:35.791 [Thread-89] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:32:35.809 [Thread-89] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:32:36.720 [Thread-90] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:32:36.752 [Thread-90] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:32:36.786 [Thread-90] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:32:41.741 [Thread-91] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:32:41.991 [Thread-91] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:32:42.004 [Thread-91] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:32:42.747 [Thread-92] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
04:32:42.761 [Thread-92] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:32:42.777 [Thread-92] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:32:42.804 [Thread-92] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:32:42.810 [Thread-92] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:32:47.765 [Thread-93] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:32:47.779 [Thread-93] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:32:47.790 [Thread-93] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:32:48.769 [Thread-94] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
04:32:48.782 [Thread-94] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:32:48.791 [Thread-94] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:32:48.808 [Thread-94] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:32:48.851 [Thread-94] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
2015-09-10 04:32:52,473:3965(0x7f2ef2ffd700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 17ms
04:32:53.786 [Thread-95] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:32:53.813 [Thread-95] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:32:53.852 [Thread-95] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:32:54.793 [Thread-96] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:32:54.798 [Thread-96] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:32:54.808 [Thread-96] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes
04:32:55.794 [Thread-97] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
04:32:55.804 [Thread-97] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
04:32:55.809 [Thread-97] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 1 journalnodes

The stderr from the JN1 task:

I0910 04:28:42.494109  6096 logging.cpp:172] INFO level logging started!
I0910 04:28:42.500326  6096 fetcher.cpp:214] Fetching URI 'http://192.168.33.11:8765/hdfs-mesos-executor-0.1.3.tgz'
I0910 04:28:42.500411  6096 fetcher.cpp:125] Fetching URI 'http://192.168.33.11:8765/hdfs-mesos-executor-0.1.3.tgz' with os::net
I0910 04:28:42.500459  6096 fetcher.cpp:135] Downloading 'http://192.168.33.11:8765/hdfs-mesos-executor-0.1.3.tgz' to '/tmp/mesos/slaves/20150910-020501-169978048-5050-1230-S0/frameworks/20150910-020501-169978048-5050-1230-0001/executors/executor.journalnode.NodeExecutor.1441859322086/runs/851cfc25-42a6-45df-9c4f-0e52138fed2e/hdfs-mesos-executor-0.1.3.tgz'
I0910 04:28:47.882668  6096 fetcher.cpp:78] Extracted resource '/tmp/mesos/slaves/20150910-020501-169978048-5050-1230-S0/frameworks/20150910-020501-169978048-5050-1230-0001/executors/executor.journalnode.NodeExecutor.1441859322086/runs/851cfc25-42a6-45df-9c4f-0e52138fed2e/hdfs-mesos-executor-0.1.3.tgz' into '/tmp/mesos/slaves/20150910-020501-169978048-5050-1230-S0/frameworks/20150910-020501-169978048-5050-1230-0001/executors/executor.journalnode.NodeExecutor.1441859322086/runs/851cfc25-42a6-45df-9c4f-0e52138fed2e'
I0910 04:28:47.888000  6096 fetcher.cpp:214] Fetching URI 'http://192.168.33.11:8765/hdfs-site.xml'
I0910 04:28:47.888078  6096 fetcher.cpp:125] Fetching URI 'http://192.168.33.11:8765/hdfs-site.xml' with os::net
I0910 04:28:47.888118  6096 fetcher.cpp:135] Downloading 'http://192.168.33.11:8765/hdfs-site.xml' to '/tmp/mesos/slaves/20150910-020501-169978048-5050-1230-S0/frameworks/20150910-020501-169978048-5050-1230-0001/executors/executor.journalnode.NodeExecutor.1441859322086/runs/851cfc25-42a6-45df-9c4f-0e52138fed2e/hdfs-site.xml'
I0910 04:28:47.932752  6096 fetcher.cpp:214] Fetching URI 'https://downloads.mesosphere.io/java/jre-7u76-linux-x64.tar.gz'
I0910 04:28:47.932917  6096 fetcher.cpp:125] Fetching URI 'https://downloads.mesosphere.io/java/jre-7u76-linux-x64.tar.gz' with os::net
I0910 04:28:47.932968  6096 fetcher.cpp:135] Downloading 'https://downloads.mesosphere.io/java/jre-7u76-linux-x64.tar.gz' to '/tmp/mesos/slaves/20150910-020501-169978048-5050-1230-S0/frameworks/20150910-020501-169978048-5050-1230-0001/executors/executor.journalnode.NodeExecutor.1441859322086/runs/851cfc25-42a6-45df-9c4f-0e52138fed2e/jre-7u76-linux-x64.tar.gz'
I0910 04:30:46.004843  6096 fetcher.cpp:78] Extracted resource '/tmp/mesos/slaves/20150910-020501-169978048-5050-1230-S0/frameworks/20150910-020501-169978048-5050-1230-0001/executors/executor.journalnode.NodeExecutor.1441859322086/runs/851cfc25-42a6-45df-9c4f-0e52138fed2e/jre-7u76-linux-x64.tar.gz' into '/tmp/mesos/slaves/20150910-020501-169978048-5050-1230-S0/frameworks/20150910-020501-169978048-5050-1230-0001/executors/executor.journalnode.NodeExecutor.1441859322086/runs/851cfc25-42a6-45df-9c4f-0e52138fed2e'

stdout is empty.

Other frameworks seems to be running fine, I tried Elasticsearch and Kafka.

elingg commented 9 years ago

If you are running other frameworks on your cluster, including ones that run in docker, it could be related to a Mesos bug where with systemd and the containerizer where if you run multiple frameworks (that run in docker and outside docker), executors and tasks get killed.

If you keep seeing something like that, Sep 08 00:50:09 core-04 mesos-slave[1271]: E0908 00:50:09.389919 1278 slave.cpp:3340] Termination of executor 'executor.journalnode.NodeExecutor.1441673294675' of framework '20150908-001609-169978048-5050-1255-0001' failed: Unknown container: 126df935-512d-4e4f-b03b-7203f8ff6f75 it is a sure sign of containerizer issues.

F21 commented 9 years ago

In the above tests, I ran the hdfs framework by it self (after stopping and removing all other frameworks) and in fresh clusters.

Is there anyway to get more debug info out of the framework or mesos to work out the exact root cause?

I will also set up a Ubuntu cluster to test and see if it suffers from the same problem.

F21 commented 8 years ago

Here are my results from setting up a Ubuntu cluster containing 4 nodes: 1 master/slave and 3 slaves.

Mesos is 0.23.0 and marathon is 0.10.1.

In this case, I am still seeing the JournalNode task being lost (the same situation as when I was usinng CoreOS).

The logs from the mesos-slave where the scheduler is launched:

Sep 11 09:40:58 mesos-slave-03 mesos-slave[733]: I0911 09:40:58.568208   884 slave.cpp:4193] Received oversubscribable resources  from the resource estimator
Sep 11 09:41:13 mesos-slave-03 mesos-slave[733]: I0911 09:41:13.573972   885 slave.cpp:4179] Querying resource estimator for oversubscribable resources
Sep 11 09:41:13 mesos-slave-03 mesos-slave[733]: I0911 09:41:13.574148   885 slave.cpp:4193] Received oversubscribable resources  from the resource estimator
Sep 11 09:41:28 mesos-slave-03 mesos-slave[733]: I0911 09:41:28.579911   886 slave.cpp:4179] Querying resource estimator for oversubscribable resources
Sep 11 09:41:28 mesos-slave-03 mesos-slave[733]: I0911 09:41:28.580255   886 slave.cpp:4193] Received oversubscribable resources  from the resource estimator
Sep 11 09:41:43 mesos-slave-03 mesos-slave[733]: I0911 09:41:43.585804   887 slave.cpp:4179] Querying resource estimator for oversubscribable resources
Sep 11 09:41:43 mesos-slave-03 mesos-slave[733]: I0911 09:41:43.586032   887 slave.cpp:4193] Received oversubscribable resources  from the resource estimator
Sep 11 09:41:43 mesos-slave-03 mesos-slave[733]: I0911 09:41:43.957613   883 slave.cpp:1244] Got assigned task task.journalnode.journalnode.NodeExecutor.1441964503869 for framework 20150911-055603-169978048-5050-644-0000
Sep 11 09:41:43 mesos-slave-03 mesos-slave[733]: I0911 09:41:43.958948   883 slave.cpp:1355] Launching task task.journalnode.journalnode.NodeExecutor.1441964503869 for framework 20150911-055603-169978048-5050-644-0000
Sep 11 09:41:43 mesos-slave-03 mesos-slave[733]: I0911 09:41:43.963919   883 slave.cpp:4733] Launching executor executor.journalnode.NodeExecutor.1441964503869 of framework 20150911-055603-169978048-5050-644-0000 with resources cpus(*):0.5; mem(*):256 in work directory '/tmp/mesos/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000/executors/executor.journalnode.NodeExecutor.1441964503869/runs/c9c3afd6-af08-4af0-a2b6-5c17280c9239'
Sep 11 09:41:43 mesos-slave-03 mesos-slave[733]: I0911 09:41:43.964347   885 docker.cpp:739] No container info found, skipping launch
Sep 11 09:41:43 mesos-slave-03 mesos-slave[733]: I0911 09:41:43.964629   883 slave.cpp:1573] Queuing task 'task.journalnode.journalnode.NodeExecutor.1441964503869' for executor executor.journalnode.NodeExecutor.1441964503869 of framework '20150911-055603-169978048-5050-644-0000
Sep 11 09:41:43 mesos-slave-03 mesos-slave[733]: I0911 09:41:43.965713   885 containerizer.cpp:534] Starting container 'c9c3afd6-af08-4af0-a2b6-5c17280c9239' for executor 'executor.journalnode.NodeExecutor.1441964503869' of framework '20150911-055603-169978048-5050-644-0000'
Sep 11 09:41:43 mesos-slave-03 mesos-slave[733]: I0911 09:41:43.967423   887 launcher.cpp:131] Forked child with pid '3106' for container 'c9c3afd6-af08-4af0-a2b6-5c17280c9239'
Sep 11 09:41:43 mesos-slave-03 mesos-slave[733]: I0911 09:41:43.967866   887 containerizer.cpp:770] Checkpointing executor's forked pid 3106 to '/tmp/mesos/meta/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000/executors/executor.journalnode.NodeExecutor.1441964503869/runs/c9c3afd6-af08-4af0-a2b6-5c17280c9239/pids/forked.pid'
Sep 11 09:41:46 mesos-slave-03 mesos-slave[733]: I0911 09:41:46.034028   886 slave.cpp:3842] Current disk usage 7.30%. Max allowed age: 5.788928537733993days
Sep 11 09:41:58 mesos-slave-03 mesos-slave[733]: I0911 09:41:58.587863   889 slave.cpp:4179] Querying resource estimator for oversubscribable resources
Sep 11 09:41:58 mesos-slave-03 mesos-slave[733]: I0911 09:41:58.588443   889 slave.cpp:4193] Received oversubscribable resources  from the resource estimator
Sep 11 09:42:13 mesos-slave-03 mesos-slave[733]: I0911 09:42:13.588912   883 slave.cpp:4179] Querying resource estimator for oversubscribable resources
Sep 11 09:42:13 mesos-slave-03 mesos-slave[733]: I0911 09:42:13.589126   883 slave.cpp:4193] Received oversubscribable resources  from the resource estimator
Sep 11 09:42:28 mesos-slave-03 mesos-slave[733]: I0911 09:42:28.590184   884 slave.cpp:4179] Querying resource estimator for oversubscribable resources
Sep 11 09:42:28 mesos-slave-03 mesos-slave[733]: I0911 09:42:28.590517   884 slave.cpp:4193] Received oversubscribable resources  from the resource estimator
Sep 11 09:42:43 mesos-slave-03 mesos-slave[733]: I0911 09:42:43.592941   890 slave.cpp:4179] Querying resource estimator for oversubscribable resources
Sep 11 09:42:43 mesos-slave-03 mesos-slave[733]: I0911 09:42:43.593324   890 slave.cpp:4193] Received oversubscribable resources  from the resource estimator
Sep 11 09:42:43 mesos-slave-03 mesos-slave[733]: I0911 09:42:43.966051   890 slave.cpp:3798] Terminating executor executor.journalnode.NodeExecutor.1441964503869 of framework 20150911-055603-169978048-5050-644-0000 because it did not register within 1mins
Sep 11 09:42:43 mesos-slave-03 mesos-slave[733]: I0911 09:42:43.967139   890 containerizer.cpp:1001] Destroying container 'c9c3afd6-af08-4af0-a2b6-5c17280c9239'
Sep 11 09:42:44 mesos-slave-03 mesos-slave[733]: I0911 09:42:44.087127   883 containerizer.cpp:1188] Executor for container 'c9c3afd6-af08-4af0-a2b6-5c17280c9239' has exited
Sep 11 09:42:46 mesos-slave-03 mesos-slave[733]: I0911 09:42:46.036401   887 slave.cpp:3842] Current disk usage 7.38%. Max allowed age: 5.783462926418901days
Sep 11 09:42:58 mesos-slave-03 mesos-slave[733]: I0911 09:42:58.595213   883 slave.cpp:4179] Querying resource estimator for oversubscribable resources
Sep 11 09:42:58 mesos-slave-03 mesos-slave[733]: I0911 09:42:58.595502   883 slave.cpp:4193] Received oversubscribable resources  from the resource estimator
Sep 11 09:43:13 mesos-slave-03 mesos-slave[733]: I0911 09:43:13.597286   887 slave.cpp:4179] Querying resource estimator for oversubscribable resources
Sep 11 09:43:13 mesos-slave-03 mesos-slave[733]: I0911 09:43:13.597537   887 slave.cpp:4193] Received oversubscribable resources  from the resource estimator
Sep 11 09:43:15 mesos-slave-03 mesos-slave[733]: E0911 09:43:15.821022   888 slave.cpp:3258] Container 'c9c3afd6-af08-4af0-a2b6-5c17280c9239' for executor 'executor.journalnode.NodeExecutor.1441964503869' of framework '20150911-055603-169978048-5050-644-0000' failed to start: Container destroyed during launch
Sep 11 09:43:15 mesos-slave-03 mesos-slave[733]: E0911 09:43:15.821300   888 slave.cpp:3340] Termination of executor 'executor.journalnode.NodeExecutor.1441964503869' of framework '20150911-055603-169978048-5050-644-0000' failed: Unknown container: c9c3afd6-af08-4af0-a2b6-5c17280c9239
Sep 11 09:43:15 mesos-slave-03 mesos-slave[733]: W0911 09:43:15.821337   890 composing.cpp:520] Container 'c9c3afd6-af08-4af0-a2b6-5c17280c9239' is already destroyed
Sep 11 09:43:15 mesos-slave-03 mesos-slave[733]: I0911 09:43:15.823056   888 slave.cpp:2671] Handling status update TASK_LOST (UUID: a0efed75-b4cc-4216-8dc4-cc21c5020077) for task task.journalnode.journalnode.NodeExecutor.1441964503869 of framework 20150911-055603-169978048-5050-644-0000 from @0.0.0.0:0
Sep 11 09:43:15 mesos-slave-03 mesos-slave[733]: W0911 09:43:15.823374   887 containerizer.cpp:885] Ignoring update for unknown container: c9c3afd6-af08-4af0-a2b6-5c17280c9239
Sep 11 09:43:15 mesos-slave-03 mesos-slave[733]: I0911 09:43:15.823571   887 status_update_manager.cpp:322] Received status update TASK_LOST (UUID: a0efed75-b4cc-4216-8dc4-cc21c5020077) for task task.journalnode.journalnode.NodeExecutor.1441964503869 of framework 20150911-055603-169978048-5050-644-0000
Sep 11 09:43:15 mesos-slave-03 mesos-slave[733]: I0911 09:43:15.823814   887 status_update_manager.cpp:826] Checkpointing UPDATE for status update TASK_LOST (UUID: a0efed75-b4cc-4216-8dc4-cc21c5020077) for task task.journalnode.journalnode.NodeExecutor.1441964503869 of framework 20150911-055603-169978048-5050-644-0000
Sep 11 09:43:15 mesos-slave-03 mesos-slave[733]: I0911 09:43:15.827191   887 slave.cpp:2926] Forwarding the update TASK_LOST (UUID: a0efed75-b4cc-4216-8dc4-cc21c5020077) for task task.journalnode.journalnode.NodeExecutor.1441964503869 of framework 20150911-055603-169978048-5050-644-0000 to master@192.168.33.10:5050
Sep 11 09:43:15 mesos-slave-03 mesos-slave[733]: I0911 09:43:15.850566   884 status_update_manager.cpp:394] Received status update acknowledgement (UUID: a0efed75-b4cc-4216-8dc4-cc21c5020077) for task task.journalnode.journalnode.NodeExecutor.1441964503869 of framework 20150911-055603-169978048-5050-644-0000
Sep 11 09:43:15 mesos-slave-03 mesos-slave[733]: I0911 09:43:15.850788   884 status_update_manager.cpp:826] Checkpointing ACK for status update TASK_LOST (UUID: a0efed75-b4cc-4216-8dc4-cc21c5020077) for task task.journalnode.journalnode.NodeExecutor.1441964503869 of framework 20150911-055603-169978048-5050-644-0000
Sep 11 09:43:15 mesos-slave-03 mesos-slave[733]: I0911 09:43:15.853552   884 slave.cpp:3460] Cleaning up executor 'executor.journalnode.NodeExecutor.1441964503869' of framework 20150911-055603-169978048-5050-644-0000
Sep 11 09:43:15 mesos-slave-03 mesos-slave[733]: I0911 09:43:15.854105   884 slave.cpp:3549] Cleaning up framework 20150911-055603-169978048-5050-644-0000
Sep 11 09:43:15 mesos-slave-03 mesos-slave[733]: I0911 09:43:15.854246   884 gc.cpp:56] Scheduling '/tmp/mesos/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000/executors/executor.journalnode.NodeExecutor.1441964503869/runs/c9c3afd6-af08-4af0-a2b6-5c17280c9239' for gc 6.99999011670222days in the future
Sep 11 09:43:15 mesos-slave-03 mesos-slave[733]: I0911 09:43:15.854334   884 gc.cpp:56] Scheduling '/tmp/mesos/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000/executors/executor.journalnode.NodeExecutor.1441964503869' for gc 6.99999011571852days in the future
Sep 11 09:43:15 mesos-slave-03 mesos-slave[733]: I0911 09:43:15.854410   884 gc.cpp:56] Scheduling '/tmp/mesos/meta/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000/executors/executor.journalnode.NodeExecutor.1441964503869/runs/c9c3afd6-af08-4af0-a2b6-5c17280c9239' for gc 6.99999011518519days in the future
Sep 11 09:43:15 mesos-slave-03 mesos-slave[733]: I0911 09:43:15.854480   884 gc.cpp:56] Scheduling '/tmp/mesos/meta/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000/executors/executor.journalnode.NodeExecutor.1441964503869' for gc 6.99999011474667days in the future
Sep 11 09:43:15 mesos-slave-03 mesos-slave[733]: I0911 09:43:15.854548   884 gc.cpp:56] Scheduling '/tmp/mesos/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000' for gc 6.99999011355556days in the future
Sep 11 09:43:15 mesos-slave-03 mesos-slave[733]: I0911 09:43:15.854614   884 gc.cpp:56] Scheduling '/tmp/mesos/meta/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000' for gc 6.99999011315852days in the future
Sep 11 09:43:15 mesos-slave-03 mesos-slave[733]: I0911 09:43:15.854683   884 status_update_manager.cpp:284] Closing status update streams for framework 20150911-055603-169978048-5050-644-0000
Sep 11 09:43:15 mesos-slave-03 mesos-slave[733]: I0911 09:43:15.892761   884 slave.cpp:1244] Got assigned task task.journalnode.journalnode.NodeExecutor.1441964595884 for framework 20150911-055603-169978048-5050-644-0000
Sep 11 09:43:15 mesos-slave-03 mesos-slave[733]: I0911 09:43:15.893767   884 gc.cpp:84] Unscheduling '/tmp/mesos/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000' from gc
Sep 11 09:43:15 mesos-slave-03 mesos-slave[733]: I0911 09:43:15.894336   888 gc.cpp:84] Unscheduling '/tmp/mesos/meta/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000' from gc
Sep 11 09:43:15 mesos-slave-03 mesos-slave[733]: I0911 09:43:15.894470   888 slave.cpp:1355] Launching task task.journalnode.journalnode.NodeExecutor.1441964595884 for framework 20150911-055603-169978048-5050-644-0000
Sep 11 09:43:15 mesos-slave-03 mesos-slave[733]: I0911 09:43:15.898707   888 slave.cpp:4733] Launching executor executor.journalnode.NodeExecutor.1441964595884 of framework 20150911-055603-169978048-5050-644-0000 with resources cpus(*):0.5; mem(*):256 in work directory '/tmp/mesos/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000/executors/executor.journalnode.NodeExecutor.1441964595884/runs/fecea3fc-87f2-404b-943f-08230a1f7938'
Sep 11 09:43:15 mesos-slave-03 mesos-slave[733]: I0911 09:43:15.899261   888 slave.cpp:1573] Queuing task 'task.journalnode.journalnode.NodeExecutor.1441964595884' for executor executor.journalnode.NodeExecutor.1441964595884 of framework '20150911-055603-169978048-5050-644-0000
Sep 11 09:43:15 mesos-slave-03 mesos-slave[733]: I0911 09:43:15.899538   883 docker.cpp:739] No container info found, skipping launch
Sep 11 09:43:15 mesos-slave-03 mesos-slave[733]: I0911 09:43:15.899785   883 containerizer.cpp:534] Starting container 'fecea3fc-87f2-404b-943f-08230a1f7938' for executor 'executor.journalnode.NodeExecutor.1441964595884' of framework '20150911-055603-169978048-5050-644-0000'
Sep 11 09:43:15 mesos-slave-03 mesos-slave[733]: I0911 09:43:15.901474   883 launcher.cpp:131] Forked child with pid '3243' for container 'fecea3fc-87f2-404b-943f-08230a1f7938'
Sep 11 09:43:15 mesos-slave-03 mesos-slave[733]: I0911 09:43:15.901763   883 containerizer.cpp:770] Checkpointing executor's forked pid 3243 to '/tmp/mesos/meta/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000/executors/executor.journalnode.NodeExecutor.1441964595884/runs/fecea3fc-87f2-404b-943f-08230a1f7938/pids/forked.pid'
Sep 11 09:43:28 mesos-slave-03 mesos-slave[733]: I0911 09:43:28.599494   884 slave.cpp:4179] Querying resource estimator for oversubscribable resources
Sep 11 09:43:28 mesos-slave-03 mesos-slave[733]: I0911 09:43:28.599934   884 slave.cpp:4193] Received oversubscribable resources  from the resource estimator
Sep 11 09:43:43 mesos-slave-03 mesos-slave[733]: I0911 09:43:43.603425   883 slave.cpp:4179] Querying resource estimator for oversubscribable resources
Sep 11 09:43:43 mesos-slave-03 mesos-slave[733]: I0911 09:43:43.604190   883 slave.cpp:4193] Received oversubscribable resources  from the resource estimator
Sep 11 09:43:46 mesos-slave-03 mesos-slave[733]: I0911 09:43:46.037075   887 slave.cpp:3842] Current disk usage 8.06%. Max allowed age: 5.735835403677870days
Sep 11 09:43:58 mesos-slave-03 mesos-slave[733]: I0911 09:43:58.604714   890 slave.cpp:4179] Querying resource estimator for oversubscribable resources
Sep 11 09:43:58 mesos-slave-03 mesos-slave[733]: I0911 09:43:58.605131   890 slave.cpp:4193] Received oversubscribable resources  from the resource estimator
Sep 11 09:44:13 mesos-slave-03 mesos-slave[733]: I0911 09:44:13.606503   884 slave.cpp:4179] Querying resource estimator for oversubscribable resources
Sep 11 09:44:13 mesos-slave-03 mesos-slave[733]: I0911 09:44:13.606879   884 slave.cpp:4193] Received oversubscribable resources  from the resource estimator
Sep 11 09:44:15 mesos-slave-03 mesos-slave[733]: I0911 09:44:15.900657   886 slave.cpp:3798] Terminating executor executor.journalnode.NodeExecutor.1441964595884 of framework 20150911-055603-169978048-5050-644-0000 because it did not register within 1mins
Sep 11 09:44:15 mesos-slave-03 mesos-slave[733]: I0911 09:44:15.901098   886 containerizer.cpp:1001] Destroying container 'fecea3fc-87f2-404b-943f-08230a1f7938'
Sep 11 09:44:16 mesos-slave-03 mesos-slave[733]: I0911 09:44:16.005170   889 containerizer.cpp:1188] Executor for container 'fecea3fc-87f2-404b-943f-08230a1f7938' has exited
Sep 11 09:44:28 mesos-slave-03 mesos-slave[733]: I0911 09:44:28.610803   889 slave.cpp:4179] Querying resource estimator for oversubscribable resources
Sep 11 09:44:28 mesos-slave-03 mesos-slave[733]: I0911 09:44:28.611074   889 slave.cpp:4193] Received oversubscribable resources  from the resource estimator
Sep 11 09:44:43 mesos-slave-03 mesos-slave[733]: I0911 09:44:43.612027   890 slave.cpp:4179] Querying resource estimator for oversubscribable resources
Sep 11 09:44:43 mesos-slave-03 mesos-slave[733]: I0911 09:44:43.612483   890 slave.cpp:4193] Received oversubscribable resources  from the resource estimator
Sep 11 09:44:46 mesos-slave-03 mesos-slave[733]: I0911 09:44:46.044634   885 slave.cpp:3842] Current disk usage 8.13%. Max allowed age: 5.730711694039757days
Sep 11 09:44:55 mesos-slave-03 mesos-slave[733]: E0911 09:44:55.962294   884 slave.cpp:3258] Container 'fecea3fc-87f2-404b-943f-08230a1f7938' for executor 'executor.journalnode.NodeExecutor.1441964595884' of framework '20150911-055603-169978048-5050-644-0000' failed to start: Container destroyed during launch
Sep 11 09:44:55 mesos-slave-03 mesos-slave[733]: E0911 09:44:55.962698   884 slave.cpp:3340] Termination of executor 'executor.journalnode.NodeExecutor.1441964595884' of framework '20150911-055603-169978048-5050-644-0000' failed: Unknown container: fecea3fc-87f2-404b-943f-08230a1f7938
Sep 11 09:44:55 mesos-slave-03 mesos-slave[733]: W0911 09:44:55.962759   890 composing.cpp:520] Container 'fecea3fc-87f2-404b-943f-08230a1f7938' is already destroyed
Sep 11 09:44:55 mesos-slave-03 mesos-slave[733]: I0911 09:44:55.964284   884 slave.cpp:2671] Handling status update TASK_LOST (UUID: 746ef490-057b-41e1-ae93-fb8f984b6815) for task task.journalnode.journalnode.NodeExecutor.1441964595884 of framework 20150911-055603-169978048-5050-644-0000 from @0.0.0.0:0
Sep 11 09:44:55 mesos-slave-03 mesos-slave[733]: W0911 09:44:55.964607   883 containerizer.cpp:885] Ignoring update for unknown container: fecea3fc-87f2-404b-943f-08230a1f7938
Sep 11 09:44:55 mesos-slave-03 mesos-slave[733]: I0911 09:44:55.964953   884 status_update_manager.cpp:322] Received status update TASK_LOST (UUID: 746ef490-057b-41e1-ae93-fb8f984b6815) for task task.journalnode.journalnode.NodeExecutor.1441964595884 of framework 20150911-055603-169978048-5050-644-0000
Sep 11 09:44:55 mesos-slave-03 mesos-slave[733]: I0911 09:44:55.965289   884 status_update_manager.cpp:826] Checkpointing UPDATE for status update TASK_LOST (UUID: 746ef490-057b-41e1-ae93-fb8f984b6815) for task task.journalnode.journalnode.NodeExecutor.1441964595884 of framework 20150911-055603-169978048-5050-644-0000
Sep 11 09:44:55 mesos-slave-03 mesos-slave[733]: I0911 09:44:55.970175   884 slave.cpp:2926] Forwarding the update TASK_LOST (UUID: 746ef490-057b-41e1-ae93-fb8f984b6815) for task task.journalnode.journalnode.NodeExecutor.1441964595884 of framework 20150911-055603-169978048-5050-644-0000 to master@192.168.33.10:5050
Sep 11 09:44:55 mesos-slave-03 mesos-slave[733]: I0911 09:44:55.995640   885 status_update_manager.cpp:394] Received status update acknowledgement (UUID: 746ef490-057b-41e1-ae93-fb8f984b6815) for task task.journalnode.journalnode.NodeExecutor.1441964595884 of framework 20150911-055603-169978048-5050-644-0000
Sep 11 09:44:55 mesos-slave-03 mesos-slave[733]: I0911 09:44:55.996058   885 status_update_manager.cpp:826] Checkpointing ACK for status update TASK_LOST (UUID: 746ef490-057b-41e1-ae93-fb8f984b6815) for task task.journalnode.journalnode.NodeExecutor.1441964595884 of framework 20150911-055603-169978048-5050-644-0000
Sep 11 09:44:55 mesos-slave-03 mesos-slave[733]: I0911 09:44:55.998778   885 slave.cpp:3460] Cleaning up executor 'executor.journalnode.NodeExecutor.1441964595884' of framework 20150911-055603-169978048-5050-644-0000
Sep 11 09:44:55 mesos-slave-03 mesos-slave[733]: I0911 09:44:55.999249   887 gc.cpp:56] Scheduling '/tmp/mesos/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000/executors/executor.journalnode.NodeExecutor.1441964595884/runs/fecea3fc-87f2-404b-943f-08230a1f7938' for gc 6.99998843677037days in the future
Sep 11 09:44:55 mesos-slave-03 mesos-slave[733]: I0911 09:44:55.999308   885 slave.cpp:3549] Cleaning up framework 20150911-055603-169978048-5050-644-0000
Sep 11 09:44:55 mesos-slave-03 mesos-slave[733]: I0911 09:44:55.999354   887 gc.cpp:56] Scheduling '/tmp/mesos/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000/executors/executor.journalnode.NodeExecutor.1441964595884' for gc 6.99998843509333days in the future
Sep 11 09:44:55 mesos-slave-03 mesos-slave[733]: I0911 09:44:55.999428   887 gc.cpp:56] Scheduling '/tmp/mesos/meta/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000/executors/executor.journalnode.NodeExecutor.1441964595884/runs/fecea3fc-87f2-404b-943f-08230a1f7938' for gc 6.99998843453926days in the future
Sep 11 09:44:55 mesos-slave-03 mesos-slave[733]: I0911 09:44:55.999544   887 gc.cpp:56] Scheduling '/tmp/mesos/meta/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000/executors/executor.journalnode.NodeExecutor.1441964595884' for gc 6.99998843412148days in the future
Sep 11 09:44:56 mesos-slave-03 mesos-slave[733]: I0911 09:44:55.999615   887 gc.cpp:56] Scheduling '/tmp/mesos/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000' for gc 6.99998843276741days in the future
Sep 11 09:44:56 mesos-slave-03 mesos-slave[733]: I0911 09:44:56.000236   887 gc.cpp:56] Scheduling '/tmp/mesos/meta/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000' for gc 6.99998843239704days in the future
Sep 11 09:44:56 mesos-slave-03 mesos-slave[733]: I0911 09:44:55.999428   889 status_update_manager.cpp:284] Closing status update streams for framework 20150911-055603-169978048-5050-644-0000
Sep 11 09:44:56 mesos-slave-03 mesos-slave[733]: I0911 09:44:56.233333   890 slave.cpp:1244] Got assigned task task.journalnode.journalnode.NodeExecutor.1441964696232 for framework 20150911-055603-169978048-5050-644-0000
Sep 11 09:44:56 mesos-slave-03 mesos-slave[733]: I0911 09:44:56.234222   890 gc.cpp:84] Unscheduling '/tmp/mesos/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000' from gc
Sep 11 09:44:56 mesos-slave-03 mesos-slave[733]: I0911 09:44:56.234328   890 gc.cpp:84] Unscheduling '/tmp/mesos/meta/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000' from gc
Sep 11 09:44:56 mesos-slave-03 mesos-slave[733]: I0911 09:44:56.234475   889 slave.cpp:1355] Launching task task.journalnode.journalnode.NodeExecutor.1441964696232 for framework 20150911-055603-169978048-5050-644-0000
Sep 11 09:44:56 mesos-slave-03 mesos-slave[733]: I0911 09:44:56.239325   889 slave.cpp:4733] Launching executor executor.journalnode.NodeExecutor.1441964696232 of framework 20150911-055603-169978048-5050-644-0000 with resources cpus(*):0.5; mem(*):256 in work directory '/tmp/mesos/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000/executors/executor.journalnode.NodeExecutor.1441964696232/runs/f583dcb6-5a78-4d86-9576-da09527a5f6c'
Sep 11 09:44:56 mesos-slave-03 mesos-slave[733]: I0911 09:44:56.240191   888 docker.cpp:739] No container info found, skipping launch
Sep 11 09:44:56 mesos-slave-03 mesos-slave[733]: I0911 09:44:56.240340   889 slave.cpp:1573] Queuing task 'task.journalnode.journalnode.NodeExecutor.1441964696232' for executor executor.journalnode.NodeExecutor.1441964696232 of framework '20150911-055603-169978048-5050-644-0000
Sep 11 09:44:56 mesos-slave-03 mesos-slave[733]: I0911 09:44:56.240444   886 containerizer.cpp:534] Starting container 'f583dcb6-5a78-4d86-9576-da09527a5f6c' for executor 'executor.journalnode.NodeExecutor.1441964696232' of framework '20150911-055603-169978048-5050-644-0000'
Sep 11 09:44:56 mesos-slave-03 mesos-slave[733]: I0911 09:44:56.242135   888 launcher.cpp:131] Forked child with pid '3289' for container 'f583dcb6-5a78-4d86-9576-da09527a5f6c'
Sep 11 09:44:56 mesos-slave-03 mesos-slave[733]: I0911 09:44:56.242538   888 containerizer.cpp:770] Checkpointing executor's forked pid 3289 to '/tmp/mesos/meta/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000/executors/executor.journalnode.NodeExecutor.1441964696232/runs/f583dcb6-5a78-4d86-9576-da09527a5f6c/pids/forked.pid'
Sep 11 09:44:58 mesos-slave-03 mesos-slave[733]: I0911 09:44:58.616075   889 slave.cpp:4179] Querying resource estimator for oversubscribable resources
Sep 11 09:44:58 mesos-slave-03 mesos-slave[733]: I0911 09:44:58.616453   889 slave.cpp:4193] Received oversubscribable resources  from the resource estimator
Sep 11 09:45:13 mesos-slave-03 mesos-slave[733]: I0911 09:45:13.617174   885 slave.cpp:4179] Querying resource estimator for oversubscribable resources
Sep 11 09:45:13 mesos-slave-03 mesos-slave[733]: I0911 09:45:13.617477   885 slave.cpp:4193] Received oversubscribable resources  from the resource estimator
Sep 11 09:45:28 mesos-slave-03 mesos-slave[733]: I0911 09:45:28.618423   887 slave.cpp:4179] Querying resource estimator for oversubscribable resources
Sep 11 09:45:28 mesos-slave-03 mesos-slave[733]: I0911 09:45:28.618767   887 slave.cpp:4193] Received oversubscribable resources  from the resource estimator
Sep 11 09:45:43 mesos-slave-03 mesos-slave[733]: I0911 09:45:43.619093   883 slave.cpp:4179] Querying resource estimator for oversubscribable resources
Sep 11 09:45:43 mesos-slave-03 mesos-slave[733]: I0911 09:45:43.619428   883 slave.cpp:4193] Received oversubscribable resources  from the resource estimator
Sep 11 09:45:46 mesos-slave-03 mesos-slave[733]: I0911 09:45:46.047137   887 slave.cpp:3842] Current disk usage 8.82%. Max allowed age: 5.682946585110810days
Sep 11 09:45:56 mesos-slave-03 mesos-slave[733]: I0911 09:45:56.242110   888 slave.cpp:3798] Terminating executor executor.journalnode.NodeExecutor.1441964696232 of framework 20150911-055603-169978048-5050-644-0000 because it did not register within 1mins
Sep 11 09:45:56 mesos-slave-03 mesos-slave[733]: I0911 09:45:56.242458   888 containerizer.cpp:1001] Destroying container 'f583dcb6-5a78-4d86-9576-da09527a5f6c'
Sep 11 09:45:56 mesos-slave-03 mesos-slave[733]: I0911 09:45:56.355291   883 containerizer.cpp:1188] Executor for container 'f583dcb6-5a78-4d86-9576-da09527a5f6c' has exited
Sep 11 09:45:58 mesos-slave-03 mesos-slave[733]: I0911 09:45:58.621155   884 slave.cpp:4179] Querying resource estimator for oversubscribable resources
Sep 11 09:45:58 mesos-slave-03 mesos-slave[733]: I0911 09:45:58.621547   884 slave.cpp:4193] Received oversubscribable resources  from the resource estimator
Sep 11 09:46:13 mesos-slave-03 mesos-slave[733]: I0911 09:46:13.623869   883 slave.cpp:4179] Querying resource estimator for oversubscribable resources
Sep 11 09:46:13 mesos-slave-03 mesos-slave[733]: I0911 09:46:13.624193   883 slave.cpp:4193] Received oversubscribable resources  from the resource estimator
Sep 11 09:46:27 mesos-slave-03 mesos-slave[733]: E0911 09:46:27.744279   884 slave.cpp:3258] Container 'f583dcb6-5a78-4d86-9576-da09527a5f6c' for executor 'executor.journalnode.NodeExecutor.1441964696232' of framework '20150911-055603-169978048-5050-644-0000' failed to start: Container destroyed during launch
Sep 11 09:46:27 mesos-slave-03 mesos-slave[733]: E0911 09:46:27.744730   884 slave.cpp:3340] Termination of executor 'executor.journalnode.NodeExecutor.1441964696232' of framework '20150911-055603-169978048-5050-644-0000' failed: Unknown container: f583dcb6-5a78-4d86-9576-da09527a5f6c
Sep 11 09:46:27 mesos-slave-03 mesos-slave[733]: W0911 09:46:27.744840   887 composing.cpp:520] Container 'f583dcb6-5a78-4d86-9576-da09527a5f6c' is already destroyed
Sep 11 09:46:27 mesos-slave-03 mesos-slave[733]: I0911 09:46:27.746973   884 slave.cpp:2671] Handling status update TASK_LOST (UUID: 69765d46-b587-47c3-b6d7-789e03081a4c) for task task.journalnode.journalnode.NodeExecutor.1441964696232 of framework 20150911-055603-169978048-5050-644-0000 from @0.0.0.0:0
Sep 11 09:46:27 mesos-slave-03 mesos-slave[733]: W0911 09:46:27.747392   884 containerizer.cpp:885] Ignoring update for unknown container: f583dcb6-5a78-4d86-9576-da09527a5f6c
Sep 11 09:46:27 mesos-slave-03 mesos-slave[733]: I0911 09:46:27.748229   888 status_update_manager.cpp:322] Received status update TASK_LOST (UUID: 69765d46-b587-47c3-b6d7-789e03081a4c) for task task.journalnode.journalnode.NodeExecutor.1441964696232 of framework 20150911-055603-169978048-5050-644-0000
Sep 11 09:46:27 mesos-slave-03 mesos-slave[733]: I0911 09:46:27.748561   888 status_update_manager.cpp:826] Checkpointing UPDATE for status update TASK_LOST (UUID: 69765d46-b587-47c3-b6d7-789e03081a4c) for task task.journalnode.journalnode.NodeExecutor.1441964696232 of framework 20150911-055603-169978048-5050-644-0000
Sep 11 09:46:27 mesos-slave-03 mesos-slave[733]: I0911 09:46:27.753988   888 slave.cpp:2926] Forwarding the update TASK_LOST (UUID: 69765d46-b587-47c3-b6d7-789e03081a4c) for task task.journalnode.journalnode.NodeExecutor.1441964696232 of framework 20150911-055603-169978048-5050-644-0000 to master@192.168.33.10:5050
Sep 11 09:46:27 mesos-slave-03 mesos-slave[733]: I0911 09:46:27.780983   888 status_update_manager.cpp:394] Received status update acknowledgement (UUID: 69765d46-b587-47c3-b6d7-789e03081a4c) for task task.journalnode.journalnode.NodeExecutor.1441964696232 of framework 20150911-055603-169978048-5050-644-0000
Sep 11 09:46:27 mesos-slave-03 mesos-slave[733]: I0911 09:46:27.781189   888 status_update_manager.cpp:826] Checkpointing ACK for status update TASK_LOST (UUID: 69765d46-b587-47c3-b6d7-789e03081a4c) for task task.journalnode.journalnode.NodeExecutor.1441964696232 of framework 20150911-055603-169978048-5050-644-0000
Sep 11 09:46:27 mesos-slave-03 mesos-slave[733]: I0911 09:46:27.783129   888 slave.cpp:3460] Cleaning up executor 'executor.journalnode.NodeExecutor.1441964696232' of framework 20150911-055603-169978048-5050-644-0000
Sep 11 09:46:27 mesos-slave-03 mesos-slave[733]: I0911 09:46:27.783387   883 gc.cpp:56] Scheduling '/tmp/mesos/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000/executors/executor.journalnode.NodeExecutor.1441964696232/runs/f583dcb6-5a78-4d86-9576-da09527a5f6c' for gc 6.99999093409778days in the future
Sep 11 09:46:27 mesos-slave-03 mesos-slave[733]: I0911 09:46:27.783429   888 slave.cpp:3549] Cleaning up framework 20150911-055603-169978048-5050-644-0000
Sep 11 09:46:27 mesos-slave-03 mesos-slave[733]: I0911 09:46:27.783458   883 gc.cpp:56] Scheduling '/tmp/mesos/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000/executors/executor.journalnode.NodeExecutor.1441964696232' for gc 6.99999093339259days in the future
Sep 11 09:46:27 mesos-slave-03 mesos-slave[733]: I0911 09:46:27.783504   883 gc.cpp:56] Scheduling '/tmp/mesos/meta/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000/executors/executor.journalnode.NodeExecutor.1441964696232/runs/f583dcb6-5a78-4d86-9576-da09527a5f6c' for gc 6.99999093307852days in the future
Sep 11 09:46:27 mesos-slave-03 mesos-slave[733]: I0911 09:46:27.783571   883 gc.cpp:56] Scheduling '/tmp/mesos/meta/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000/executors/executor.journalnode.NodeExecutor.1441964696232' for gc 6.99999093265185days in the future
Sep 11 09:46:27 mesos-slave-03 mesos-slave[733]: I0911 09:46:27.783618   883 gc.cpp:56] Scheduling '/tmp/mesos/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000' for gc 6.99999093189926days in the future
Sep 11 09:46:27 mesos-slave-03 mesos-slave[733]: I0911 09:46:27.783735   883 gc.cpp:56] Scheduling '/tmp/mesos/meta/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000' for gc 6.99999093167704days in the future
Sep 11 09:46:27 mesos-slave-03 mesos-slave[733]: I0911 09:46:27.783519   888 status_update_manager.cpp:284] Closing status update streams for framework 20150911-055603-169978048-5050-644-0000
Sep 11 09:46:28 mesos-slave-03 mesos-slave[733]: I0911 09:46:28.625985   883 slave.cpp:4179] Querying resource estimator for oversubscribable resources
Sep 11 09:46:28 mesos-slave-03 mesos-slave[733]: I0911 09:46:28.626142   883 slave.cpp:4193] Received oversubscribable resources  from the resource estimator
Sep 11 09:46:29 mesos-slave-03 mesos-slave[733]: I0911 09:46:29.466192   887 slave.cpp:1244] Got assigned task task.journalnode.journalnode.NodeExecutor.1441964789479 for framework 20150911-055603-169978048-5050-644-0000
Sep 11 09:46:29 mesos-slave-03 mesos-slave[733]: I0911 09:46:29.468091   887 gc.cpp:84] Unscheduling '/tmp/mesos/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000' from gc
Sep 11 09:46:29 mesos-slave-03 mesos-slave[733]: I0911 09:46:29.468191   887 gc.cpp:84] Unscheduling '/tmp/mesos/meta/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000' from gc
Sep 11 09:46:29 mesos-slave-03 mesos-slave[733]: I0911 09:46:29.468298   887 slave.cpp:1355] Launching task task.journalnode.journalnode.NodeExecutor.1441964789479 for framework 20150911-055603-169978048-5050-644-0000
Sep 11 09:46:29 mesos-slave-03 mesos-slave[733]: I0911 09:46:29.476373   887 slave.cpp:4733] Launching executor executor.journalnode.NodeExecutor.1441964789479 of framework 20150911-055603-169978048-5050-644-0000 with resources cpus(*):0.5; mem(*):256 in work directory '/tmp/mesos/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000/executors/executor.journalnode.NodeExecutor.1441964789479/runs/d8109757-6f3b-46ca-90a9-eda87ce25553'
Sep 11 09:46:29 mesos-slave-03 mesos-slave[733]: I0911 09:46:29.477032   887 slave.cpp:1573] Queuing task 'task.journalnode.journalnode.NodeExecutor.1441964789479' for executor executor.journalnode.NodeExecutor.1441964789479 of framework '20150911-055603-169978048-5050-644-0000
Sep 11 09:46:29 mesos-slave-03 mesos-slave[733]: I0911 09:46:29.477267   887 docker.cpp:739] No container info found, skipping launch
Sep 11 09:46:29 mesos-slave-03 mesos-slave[733]: I0911 09:46:29.477401   887 containerizer.cpp:534] Starting container 'd8109757-6f3b-46ca-90a9-eda87ce25553' for executor 'executor.journalnode.NodeExecutor.1441964789479' of framework '20150911-055603-169978048-5050-644-0000'
Sep 11 09:46:29 mesos-slave-03 mesos-slave[733]: I0911 09:46:29.478739   884 launcher.cpp:131] Forked child with pid '3320' for container 'd8109757-6f3b-46ca-90a9-eda87ce25553'
Sep 11 09:46:29 mesos-slave-03 mesos-slave[733]: I0911 09:46:29.479094   884 containerizer.cpp:770] Checkpointing executor's forked pid 3320 to '/tmp/mesos/meta/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000/executors/executor.journalnode.NodeExecutor.1441964789479/runs/d8109757-6f3b-46ca-90a9-eda87ce25553/pids/forked.pid'
Sep 11 09:46:36 mesos-slave-03 mesos-slave[733]: 2015-09-11 09:46:36,349:638(0x7ff2e9dd6700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 12ms
Sep 11 09:46:43 mesos-slave-03 mesos-slave[733]: I0911 09:46:43.626682   888 slave.cpp:4179] Querying resource estimator for oversubscribable resources
Sep 11 09:46:43 mesos-slave-03 mesos-slave[733]: I0911 09:46:43.627041   888 slave.cpp:4193] Received oversubscribable resources  from the resource estimator
Sep 11 09:46:46 mesos-slave-03 mesos-slave[733]: I0911 09:46:46.048861   887 slave.cpp:3842] Current disk usage 9.50%. Max allowed age: 5.634927629665150days

stdout is empty and stderr just contains some log entries showing the journal executor being downloaded and extracted:

I0911 09:43:15.941109  3249 fetcher.cpp:409] Fetcher Info: {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/20150911-055603-169978048-5050-644-S3\/root","items":[{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"http:\/\/192.168.33.12:8765\/hdfs-mesos-executor-0.1.3.tgz"}},{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"http:\/\/192.168.33.12:8765\/hdfs-site.xml"}},{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"https:\/\/downloads.mesosphere.io\/java\/jre-7u76-linux-x64.tar.gz"}}],"sandbox_directory":"\/tmp\/mesos\/slaves\/20150911-055603-169978048-5050-644-S3\/frameworks\/20150911-055603-169978048-5050-644-0000\/executors\/executor.journalnode.NodeExecutor.1441964595884\/runs\/fecea3fc-87f2-404b-943f-08230a1f7938","user":"root"}
I0911 09:43:15.943086  3249 fetcher.cpp:364] Fetching URI 'http://192.168.33.12:8765/hdfs-mesos-executor-0.1.3.tgz'
I0911 09:43:15.943120  3249 fetcher.cpp:238] Fetching directly into the sandbox directory
I0911 09:43:15.943151  3249 fetcher.cpp:176] Fetching URI 'http://192.168.33.12:8765/hdfs-mesos-executor-0.1.3.tgz'
I0911 09:43:15.943177  3249 fetcher.cpp:126] Downloading resource from 'http://192.168.33.12:8765/hdfs-mesos-executor-0.1.3.tgz' to '/tmp/mesos/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000/executors/executor.journalnode.NodeExecutor.1441964595884/runs/fecea3fc-87f2-404b-943f-08230a1f7938/hdfs-mesos-executor-0.1.3.tgz'
I0911 09:43:17.369046  3249 fetcher.cpp:76] Extracting with command: tar -C '/tmp/mesos/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000/executors/executor.journalnode.NodeExecutor.1441964595884/runs/fecea3fc-87f2-404b-943f-08230a1f7938' -xf '/tmp/mesos/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000/executors/executor.journalnode.NodeExecutor.1441964595884/runs/fecea3fc-87f2-404b-943f-08230a1f7938/hdfs-mesos-executor-0.1.3.tgz'
I0911 09:43:18.306526  3249 fetcher.cpp:84] Extracted '/tmp/mesos/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000/executors/executor.journalnode.NodeExecutor.1441964595884/runs/fecea3fc-87f2-404b-943f-08230a1f7938/hdfs-mesos-executor-0.1.3.tgz' into '/tmp/mesos/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000/executors/executor.journalnode.NodeExecutor.1441964595884/runs/fecea3fc-87f2-404b-943f-08230a1f7938'
W0911 09:43:18.306586  3249 fetcher.cpp:260] Copying instead of extracting resource from URI with 'extract' flag, because it does not seem to be an archive: http://192.168.33.12:8765/hdfs-mesos-executor-0.1.3.tgz
I0911 09:43:18.306607  3249 fetcher.cpp:441] Fetched 'http://192.168.33.12:8765/hdfs-mesos-executor-0.1.3.tgz' to '/tmp/mesos/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000/executors/executor.journalnode.NodeExecutor.1441964595884/runs/fecea3fc-87f2-404b-943f-08230a1f7938/hdfs-mesos-executor-0.1.3.tgz'
I0911 09:43:18.306622  3249 fetcher.cpp:364] Fetching URI 'http://192.168.33.12:8765/hdfs-site.xml'
I0911 09:43:18.306656  3249 fetcher.cpp:238] Fetching directly into the sandbox directory
I0911 09:43:18.306704  3249 fetcher.cpp:176] Fetching URI 'http://192.168.33.12:8765/hdfs-site.xml'
I0911 09:43:18.306733  3249 fetcher.cpp:126] Downloading resource from 'http://192.168.33.12:8765/hdfs-site.xml' to '/tmp/mesos/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000/executors/executor.journalnode.NodeExecutor.1441964595884/runs/fecea3fc-87f2-404b-943f-08230a1f7938/hdfs-site.xml'
W0911 09:43:18.316086  3249 fetcher.cpp:260] Copying instead of extracting resource from URI with 'extract' flag, because it does not seem to be an archive: http://192.168.33.12:8765/hdfs-site.xml
I0911 09:43:18.316125  3249 fetcher.cpp:441] Fetched 'http://192.168.33.12:8765/hdfs-site.xml' to '/tmp/mesos/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000/executors/executor.journalnode.NodeExecutor.1441964595884/runs/fecea3fc-87f2-404b-943f-08230a1f7938/hdfs-site.xml'
I0911 09:43:18.316148  3249 fetcher.cpp:364] Fetching URI 'https://downloads.mesosphere.io/java/jre-7u76-linux-x64.tar.gz'
I0911 09:43:18.316161  3249 fetcher.cpp:238] Fetching directly into the sandbox directory
I0911 09:43:18.316180  3249 fetcher.cpp:176] Fetching URI 'https://downloads.mesosphere.io/java/jre-7u76-linux-x64.tar.gz'
I0911 09:43:18.316197  3249 fetcher.cpp:126] Downloading resource from 'https://downloads.mesosphere.io/java/jre-7u76-linux-x64.tar.gz' to '/tmp/mesos/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000/executors/executor.journalnode.NodeExecutor.1441964595884/runs/fecea3fc-87f2-404b-943f-08230a1f7938/jre-7u76-linux-x64.tar.gz'
I0911 09:44:54.110146  3249 fetcher.cpp:76] Extracting with command: tar -C '/tmp/mesos/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000/executors/executor.journalnode.NodeExecutor.1441964595884/runs/fecea3fc-87f2-404b-943f-08230a1f7938' -xf '/tmp/mesos/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000/executors/executor.journalnode.NodeExecutor.1441964595884/runs/fecea3fc-87f2-404b-943f-08230a1f7938/jre-7u76-linux-x64.tar.gz'
I0911 09:44:55.855676  3249 fetcher.cpp:84] Extracted '/tmp/mesos/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000/executors/executor.journalnode.NodeExecutor.1441964595884/runs/fecea3fc-87f2-404b-943f-08230a1f7938/jre-7u76-linux-x64.tar.gz' into '/tmp/mesos/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000/executors/executor.journalnode.NodeExecutor.1441964595884/runs/fecea3fc-87f2-404b-943f-08230a1f7938'
W0911 09:44:55.855677  3249 fetcher.cpp:260] Copying instead of extracting resource from URI with 'extract' flag, because it does not seem to be an archive: https://downloads.mesosphere.io/java/jre-7u76-linux-x64.tar.gz
I0911 09:44:55.855731  3249 fetcher.cpp:441] Fetched 'https://downloads.mesosphere.io/java/jre-7u76-linux-x64.tar.gz' to '/tmp/mesos/slaves/20150911-055603-169978048-5050-644-S3/frameworks/20150911-055603-169978048-5050-644-0000/executors/executor.journalnode.NodeExecutor.1441964595884/runs/fecea3fc-87f2-404b-943f-08230a1f7938/jre-7u76-linux-x64.tar.gz'
elingg commented 8 years ago

Hm, I have an ubuntu cluster currently running correctly, but without oversubscription. Still appears to be some kind of containerizer issue related to oversubscription. I have seen issues with the oversubscription module interacting with the containerizer. The other possibility is a bug similar to https://issues.apache.org/jira/browse/MESOS-2601 or https://issues.apache.org/jira/browse/MESOS-2605, but those were fixed in Mesos 0.23.0

Sep 11 09:45:56 mesos-slave-03 mesos-slave[733]: I0911 09:45:56.242110 888 slave.cpp:3798] Terminating executor executor.journalnode.NodeExecutor.1441964696232 of framework 20150911-055603-169978048-5050-644-0000 because it did not register within 1mins

F21 commented 8 years ago

Which version of Ubuntu, mesos and marathon are you running? Also, which version of hdfs are you using? I've been building the latest HEAD (692b8b732fa1f14db41441a2ed78cd4baf058e3b yesterday).

I will try and set up an identical cluster to yours and see if I can get it to work.

elingg commented 8 years ago

There are a couple of pretty easy set ups I usually run:

1) http://docs.mesosphere.com/ (DCOS). Very simple. Just dcos package install hdfs.

2) google.mesosphere.com does a preinstalled Mesos and Marathon install (GCE with debian or ubuntu images - I recommend the images with at least 4 CPU's).

Uninstall the existing that is preinstalled HDFS with the following command, aptitude purge hadoop hadoop-yarn hadoop-hdfs hadoop-hdfs-namenode hadoop-hdfs-datanode hadoop-0.20-mapreduce hadoop-0.20-mapreduce-jobtracker hadoop-0.20-mapreduce-tasktracker hadoop-mapreduce rm -rf /etc/hadoop /mnt/hdfs /var/lib/hadoop* /var/log/hadoop* dpkg -l hadoop* (to verify uninstall)

Before building HDFS-Mesos to run on GCE, I change one configuration value: The only value I typically change is in mesos-site.xml, mesos.hdfs.zkfc.ha.zookeeper.quorum. I change localhost to point to the list of zookeeper nodes (i.e. zookeeper is running on the master so I can use masterip:2181. Second step is to upload the HDFS tarball on the master and run it there.

3) Vagrant here: https://github.com/Banno/vagrant-mesos. We did development of HDFS using this vagrantfile.

I'm very curious about your issue to see if it's related to mesos modules or the containerizer or not. Let's get to the bottom of it and see if we can update the documentation/fix any additional issues. Thanks for your investigation, @F21!

F21 commented 8 years ago

Here's some more testing.

This cluster consists of 4 nodes: 1 master/slave and 3 slaves. Ubuntu is 14.04 64-bit, mesos is 0.22.1 and Marathon is 0.10.1.

OpenJDK said tools.jar was missing, so I installed Oracle JDK 8 using these instructions: http://www.webupd8.org/2012/09/install-oracle-java-8-in-ubuntu-via-ppa.html

I also built the latest HEAD of mesos-hdfs and ran it on one of the slaves.

After launching the framework, I can see it registered in the mesos web ui. However, no tasks are being launched.

This is the output of the hdfs framework:

2015-09-14 03:35:47,472:3872(0x7f741affd700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5
2015-09-14 03:35:47,472:3872(0x7f741affd700):ZOO_INFO@log_env@716: Client environment:host.name=mesos-slave-02
2015-09-14 03:35:47,472:3872(0x7f741affd700):ZOO_INFO@log_env@723: Client environment:os.name=Linux
2015-09-14 03:35:47,472:3872(0x7f741affd700):ZOO_INFO@log_env@724: Client environment:os.arch=3.13.0-63-generic
2015-09-14 03:35:47,472:3872(0x7f741affd700):ZOO_INFO@log_env@725: Client environment:os.version=#103-Ubuntu SMP Fri Aug 14 21:42:59 UTC 2015
2015-09-14 03:35:47,472:3872(0x7f741affd700):ZOO_INFO@log_env@733: Client environment:user.name=vagrant
2015-09-14 03:35:47,472:3872(0x7f741affd700):ZOO_INFO@log_env@741: Client environment:user.home=/home/vagrant
2015-09-14 03:35:47,472:3872(0x7f741affd700):ZOO_INFO@log_env@753: Client environment:user.dir=/home/vagrant/hdfs/build/hdfs-mesos-0.1.4
2015-09-14 03:35:47,472:3872(0x7f741affd700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=master.mesos:2181 sessionTimeout=20000 watcher=0x7f7422e35a60 sessionId=0 sessionPasswd=<null> context=0x7f7400000bf0 flags=0
2015-09-14 03:35:47,477:3872(0x7f7417ddc700):ZOO_INFO@check_events@1703: initiated connection to server [192.168.33.10:2181]
2015-09-14 03:35:47,485:3872(0x7f7417ddc700):ZOO_INFO@check_events@1750: session establishment complete on server [192.168.33.10:2181], sessionId=0x14fc962e269000a, negotiated timeout=20000
03:35:47.521 [HdfsScheduler] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Registering without authentication
2015-09-14 03:35:47,530:3872(0x7f741b7fe700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5
2015-09-14 03:35:47,530:3872(0x7f741b7fe700):ZOO_INFO@log_env@716: Client environment:host.name=mesos-slave-02
2015-09-14 03:35:47,530:3872(0x7f741b7fe700):ZOO_INFO@log_env@723: Client environment:os.name=Linux
2015-09-14 03:35:47,530:3872(0x7f741b7fe700):ZOO_INFO@log_env@724: Client environment:os.arch=3.13.0-63-generic
2015-09-14 03:35:47,530:3872(0x7f741b7fe700):ZOO_INFO@log_env@725: Client environment:os.version=#103-Ubuntu SMP Fri Aug 14 21:42:59 UTC 2015
I0914 03:35:47.530618  3902 sched.cpp:157] Version: 0.22.1
2015-09-14 03:35:47,531:3872(0x7f741b7fe700):ZOO_INFO@log_env@733: Client environment:user.name=vagrant
2015-09-14 03:35:47,531:3872(0x7f741b7fe700):ZOO_INFO@log_env@741: Client environment:user.home=/home/vagrant
2015-09-14 03:35:47,531:3872(0x7f741b7fe700):ZOO_INFO@log_env@753: Client environment:user.dir=/home/vagrant/hdfs/build/hdfs-mesos-0.1.4
2015-09-14 03:35:47,531:3872(0x7f741b7fe700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=master.mesos:2181 sessionTimeout=10000 watcher=0x7f7422e35a60 sessionId=0 sessionPasswd=<null> context=0x7f740c001230 flags=0
03:35:47.535 [main] INFO  org.eclipse.jetty.util.log - Logging initialized @925ms
2015-09-14 03:35:47,535:3872(0x7f7416ad0700):ZOO_INFO@check_events@1703: initiated connection to server [192.168.33.10:2181]
2015-09-14 03:35:47,538:3872(0x7f7416ad0700):ZOO_INFO@check_events@1750: session establishment complete on server [192.168.33.10:2181], sessionId=0x14fc962e269000b, negotiated timeout=10000
I0914 03:35:47.538872  3892 group.cpp:313] Group process (group(1)@192.168.33.12:38168) connected to ZooKeeper
I0914 03:35:47.538956  3892 group.cpp:790] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
I0914 03:35:47.538993  3892 group.cpp:385] Trying to create path '/mesos' in ZooKeeper
I0914 03:35:47.544342  3892 detector.cpp:138] Detected a new leader: (id='3')
I0914 03:35:47.544850  3894 group.cpp:659] Trying to get '/mesos/info_0000000003' in ZooKeeper
I0914 03:35:47.546298  3894 detector.cpp:452] A new leading master (UPID=master@192.168.33.10:5050) is detected
I0914 03:35:47.546525  3893 sched.cpp:254] New master detected at master@192.168.33.10:5050
I0914 03:35:47.546869  3893 sched.cpp:264] No credentials provided. Attempting to register without authentication
I0914 03:35:47.549916  3896 sched.cpp:448] Framework registered with 20150914-010557-169978048-5050-2050-0000
03:35:47.558 [Thread-2] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Registered framework frameworkId=20150914-010557-169978048-5050-2050-0000
03:35:47.559 [Thread-2] INFO  o.apache.mesos.hdfs.state.LiveState - Acquisition phase is already 'RECONCILING_TASKS'
03:35:47.589 [main] INFO  org.eclipse.jetty.server.Server - jetty-9.2.z-SNAPSHOT
03:35:47.631 [main] INFO  o.e.jetty.server.ServerConnector - Started ServerConnector@1d082e88{HTTP/1.1}{0.0.0.0:8765}
03:35:47.632 [main] INFO  org.eclipse.jetty.server.Server - Started @1032ms
F21 commented 8 years ago

Noticed there was a bug in mesos 0.22.1 that didn't play well with docker 1.8.1:

Sep 14 04:33:55 mesos-slave-02 mesos-slave[2195]: Failed to create a containerizer: Could not create DockerContainerizer: Insufficient version of Docker! Please upgrade to >= 1.0.0

Will downgrade the docker version and report back.

elingg commented 8 years ago

Thanks @F21, we can also discuss by email or IRC to debug what the issue is with the Mesos cluster or the framework.

F21 commented 8 years ago

@elingg Here's my latesting findings:

Mesos is 0.22.1 Ubuntu is 14.04 64-bit and Marathon is 0.10.1. Docker is 1.7.1.

Still seeing the same issue where the journal node fails to launch:

Sep 15 00:46:53 mesos-master-01 mesos-master[2015]: I0915 00:46:53.326689  2080 master.cpp:2273] Processing ACCEPT call for offers: [ 20150914-234602-169978048-5050-2015-O2691 ] on slave 20150914-234602-169978048-5050-2015-S0 at slave(1)@192.168.33.10:5051 (mesos-master-01) for framework 20150914-234602-169978048-5050-2015-0000 (hdfs) at scheduler-32bcebe3-ec7c-4c6f-bed8-5a06a332624a@192.168.33.11:40392
Sep 15 00:46:53 mesos-master-01 mesos-master[2015]: I0915 00:46:53.326963  2080 hierarchical.hpp:648] Recovered cpus(*):2.4; mem(*):212; disk(*):35164; ports(*):[31000-31565, 31567-32000] (total allocatable: cpus(*):2.4; mem(*):212; disk(*):35164; ports(*):[31000-31565, 31567-32000]) on slave 20150914-234602-169978048-5050-2015-S0 from framework 20150914-234602-169978048-5050-2015-0000
Sep 15 00:46:53 mesos-master-01 marathon[1148]: [2015-09-15 00:46:53,325] INFO started processing 2 offers, launching at most 1 tasks per offer and 1000 tasks in total (mesosphere.marathon.tasks.IterativeOfferMatcher$:132)
Sep 15 00:46:53 mesos-master-01 mesos-master[2015]: I0915 00:46:53.327729  2077 master.cpp:2273] Processing ACCEPT call for offers: [ 20150914-234602-169978048-5050-2015-O2692 ] on slave 20150914-234602-169978048-5050-2015-S1 at slave(1)@192.168.33.11:5051 (mesos-slave-01) for framework 20150914-234602-169978048-5050-2015-0000 (hdfs) at scheduler-32bcebe3-ec7c-4c6f-bed8-5a06a332624a@192.168.33.11:40392
Sep 15 00:46:53 mesos-master-01 mesos-master[2015]: I0915 00:46:53.328044  2077 hierarchical.hpp:648] Recovered cpus(*):4; mem(*):1000; disk(*):35164; ports(*):[31000-32000] (total allocatable: cpus(*):4; mem(*):1000; disk(*):35164; ports(*):[31000-32000]) on slave 20150914-234602-169978048-5050-2015-S1 from framework 20150914-234602-169978048-5050-2015-0000
Sep 15 00:46:53 mesos-master-01 marathon[1148]: [2015-09-15 00:46:53,326] INFO Offer [20150914-234602-169978048-5050-2015-O2693]. Decline with default filter refuseSeconds (use --decline_offer_duration to configure) (mesosphere.marathon.tasks.IterativeOfferMatcher$:231)
Sep 15 00:46:53 mesos-master-01 marathon[1148]: [2015-09-15 00:46:53,327] INFO Offer [20150914-234602-169978048-5050-2015-O2694]. Decline with default filter refuseSeconds (use --decline_offer_duration to configure) (mesosphere.marathon.tasks.IterativeOfferMatcher$:231)
Sep 15 00:46:53 mesos-master-01 marathon[1148]: [2015-09-15 00:46:53,327] INFO Launched 0 tasks on 0 offers, declining 2 (mesosphere.marathon.tasks.IterativeOfferMatcher$:241)
Sep 15 00:46:53 mesos-master-01 mesos-master[2015]: I0915 00:46:53.329223  2074 master.cpp:2273] Processing ACCEPT call for offers: [ 20150914-234602-169978048-5050-2015-O2693 ] on slave 20150914-234602-169978048-5050-2015-S2 at slave(1)@192.168.33.12:5051 (mesos-slave-02) for framework 20150914-221832-169978048-5050-10929-0000 (marathon) at scheduler-8717a9c8-2e0e-43d4-82c5-4200d4d04cb1@192.168.33.10:52790
Sep 15 00:46:53 mesos-master-01 mesos-master[2015]: I0915 00:46:53.329744  2074 hierarchical.hpp:648] Recovered cpus(*):4; mem(*):1000; disk(*):35164; ports(*):[31000-32000] (total allocatable: cpus(*):4; mem(*):1000; disk(*):35164; ports(*):[31000-32000]) on slave 20150914-234602-169978048-5050-2015-S2 from framework 20150914-221832-169978048-5050-10929-0000
Sep 15 00:46:53 mesos-master-01 mesos-master[2015]: I0915 00:46:53.330298  2080 master.cpp:2273] Processing ACCEPT call for offers: [ 20150914-234602-169978048-5050-2015-O2694 ] on slave 20150914-234602-169978048-5050-2015-S3 at slave(1)@192.168.33.13:5051 (mesos-slave-03) for framework 20150914-221832-169978048-5050-10929-0000 (marathon) at scheduler-8717a9c8-2e0e-43d4-82c5-4200d4d04cb1@192.168.33.10:52790
Sep 15 00:46:53 mesos-master-01 mesos-master[2015]: I0915 00:46:53.330484  2080 hierarchical.hpp:648] Recovered cpus(*):4; mem(*):1000; disk(*):35164; ports(*):[31000-32000] (total allocatable: cpus(*):4; mem(*):1000; disk(*):35164; ports(*):[31000-32000]) on slave 20150914-234602-169978048-5050-2015-S3 from framework 20150914-221832-169978048-5050-10929-0000
Sep 15 00:46:54 mesos-master-01 mesos-master[2015]: I0915 00:46:54.317855  2077 master.cpp:3760] Sending 2 offers to framework 20150914-234602-169978048-5050-2015-0000 (hdfs) at scheduler-32bcebe3-ec7c-4c6f-bed8-5a06a332624a@192.168.33.11:40392
Sep 15 00:46:54 mesos-master-01 mesos-master[2015]: I0915 00:46:54.330700  2079 master.cpp:2273] Processing ACCEPT call for offers: [ 20150914-234602-169978048-5050-2015-O2695 ] on slave 20150914-234602-169978048-5050-2015-S2 at slave(1)@192.168.33.12:5051 (mesos-slave-02) for framework 20150914-234602-169978048-5050-2015-0000 (hdfs) at scheduler-32bcebe3-ec7c-4c6f-bed8-5a06a332624a@192.168.33.11:40392
Sep 15 00:46:54 mesos-master-01 mesos-master[2015]: I0915 00:46:54.331167  2079 hierarchical.hpp:648] Recovered cpus(*):4; mem(*):1000; disk(*):35164; ports(*):[31000-32000] (total allocatable: cpus(*):4; mem(*):1000; disk(*):35164; ports(*):[31000-32000]) on slave 20150914-234602-169978048-5050-2015-S2 from framework 20150914-234602-169978048-5050-2015-0000
Sep 15 00:46:54 mesos-master-01 mesos-master[2015]: I0915 00:46:54.331682  2066 master.cpp:2273] Processing ACCEPT call for offers: [ 20150914-234602-169978048-5050-2015-O2696 ] on slave 20150914-234602-169978048-5050-2015-S3 at slave(1)@192.168.33.13:5051 (mesos-slave-03) for framework 20150914-234602-169978048-5050-2015-0000 (hdfs) at scheduler-32bcebe3-ec7c-4c6f-bed8-5a06a332624a@192.168.33.11:40392
Sep 15 00:46:54 mesos-master-01 mesos-master[2015]: I0915 00:46:54.332104  2066 hierarchical.hpp:648] Recovered cpus(*):4; mem(*):1000; disk(*):35164; ports(*):[31000-32000] (total allocatable: cpus(*):4; mem(*):1000; disk(*):35164; ports(*):[31000-32000]) on slave 20150914-234602-169978048-5050-2015-S3 from framework 20150914-234602-169978048-5050-2015-0000
Sep 15 00:46:55 mesos-master-01 mesos-slave[2019]: W0915 00:46:55.861906  2075 containerizer.cpp:907] Ignoring destroy of unknown container: df865edb-c019-4f1a-8a70-1cee56a4f792
Sep 15 00:46:55 mesos-master-01 mesos-slave[2019]: E0915 00:46:55.862077  2068 slave.cpp:3112] Container 'df865edb-c019-4f1a-8a70-1cee56a4f792' for executor 'executor.journalnode.NodeExecutor.1442277857378' of framework '20150914-234602-169978048-5050-2015-0000' failed to start: Container destroyed during launch
Sep 15 00:46:55 mesos-master-01 mesos-slave[2019]: E0915 00:46:55.862442  2068 slave.cpp:3207] Termination of executor 'executor.journalnode.NodeExecutor.1442277857378' of framework '20150914-234602-169978048-5050-2015-0000' failed: Unknown container: df865edb-c019-4f1a-8a70-1cee56a4f792
Sep 15 00:46:55 mesos-master-01 mesos-slave[2019]: E0915 00:46:55.862818  2073 slave.cpp:3461] Failed to unmonitor container for executor executor.journalnode.NodeExecutor.1442277857378 of framework 20150914-234602-169978048-5050-2015-0000: Not monitored
Sep 15 00:46:55 mesos-master-01 mesos-slave[2019]: I0915 00:46:55.863688  2068 slave.cpp:2531] Handling status update TASK_LOST (UUID: 826824c1-f1f0-4a8d-a07f-e841c3b402d5) for task task.journalnode.journalnode.NodeExecutor.1442277857378 of framework 20150914-234602-169978048-5050-2015-0000 from @0.0.0.0:0
Sep 15 00:46:55 mesos-master-01 mesos-slave[2019]: W0915 00:46:55.863901  2068 containerizer.cpp:814] Ignoring update for unknown container: df865edb-c019-4f1a-8a70-1cee56a4f792
Sep 15 00:46:55 mesos-master-01 mesos-slave[2019]: I0915 00:46:55.863982  2068 status_update_manager.cpp:317] Received status update TASK_LOST (UUID: 826824c1-f1f0-4a8d-a07f-e841c3b402d5) for task task.journalnode.journalnode.NodeExecutor.1442277857378 of framework 20150914-234602-169978048-5050-2015-0000
Sep 15 00:46:55 mesos-master-01 mesos-slave[2019]: I0915 00:46:55.864219  2068 status_update_manager.hpp:346] Checkpointing UPDATE for status update TASK_LOST (UUID: 826824c1-f1f0-4a8d-a07f-e841c3b402d5) for task task.journalnode.journalnode.NodeExecutor.1442277857378 of framework 20150914-234602-169978048-5050-2015-0000
Sep 15 00:46:55 mesos-master-01 mesos-master[2015]: I0915 00:46:55.864292  2066 master.cpp:3393] Executor executor.journalnode.NodeExecutor.1442277857378 of framework 20150914-234602-169978048-5050-2015-0000 on slave 20150914-234602-169978048-5050-2015-S0 at slave(1)@192.168.33.10:5051 (mesos-master-01) terminated with signal Unknown signal 127
Sep 15 00:46:55 mesos-master-01 mesos-master[2015]: I0915 00:46:55.864419  2066 master.cpp:4719] Removing executor 'executor.journalnode.NodeExecutor.1442277857378' with resources cpus(*):0.5; mem(*):256 of framework 20150914-234602-169978048-5050-2015-0000 on slave 20150914-234602-169978048-5050-2015-S0 at slave(1)@192.168.33.10:5051 (mesos-master-01)
Sep 15 00:46:55 mesos-master-01 mesos-master[2015]: I0915 00:46:55.864542  2066 hierarchical.hpp:648] Recovered cpus(*):0.5; mem(*):256 (total allocatable: cpus(*):2.9; mem(*):468; disk(*):35164; ports(*):[31000-31565, 31567-32000]) on slave 20150914-234602-169978048-5050-2015-S0 from framework 20150914-234602-169978048-5050-2015-0000
Sep 15 00:46:55 mesos-master-01 mesos-slave[2019]: I0915 00:46:55.867033  2068 slave.cpp:2776] Forwarding the update TASK_LOST (UUID: 826824c1-f1f0-4a8d-a07f-e841c3b402d5) for task task.journalnode.journalnode.NodeExecutor.1442277857378 of framework 20150914-234602-169978048-5050-2015-0000 to master@192.168.33.10:5050
Sep 15 00:46:55 mesos-master-01 mesos-master[2015]: I0915 00:46:55.867727  2072 master.cpp:3300] Status update TASK_LOST (UUID: 826824c1-f1f0-4a8d-a07f-e841c3b402d5) for task task.journalnode.journalnode.NodeExecutor.1442277857378 of framework 20150914-234602-169978048-5050-2015-0000 from slave 20150914-234602-169978048-5050-2015-S0 at slave(1)@192.168.33.10:5051 (mesos-master-01)
Sep 15 00:46:55 mesos-master-01 mesos-master[2015]: I0915 00:46:55.867784  2072 master.cpp:3341] Forwarding status update TASK_LOST (UUID: 826824c1-f1f0-4a8d-a07f-e841c3b402d5) for task task.journalnode.journalnode.NodeExecutor.1442277857378 of framework 20150914-234602-169978048-5050-2015-0000
Sep 15 00:46:55 mesos-master-01 mesos-master[2015]: I0915 00:46:55.867913  2072 master.cpp:4623] Updating the latest state of task task.journalnode.journalnode.NodeExecutor.1442277857378 of framework 20150914-234602-169978048-5050-2015-0000 to TASK_LOST
Sep 15 00:46:55 mesos-master-01 mesos-master[2015]: I0915 00:46:55.868052  2072 hierarchical.hpp:648] Recovered cpus(*):1; mem(*):512 (total allocatable: cpus(*):3.9; mem(*):980; disk(*):35164; ports(*):[31000-31565, 31567-32000]) on slave 20150914-234602-169978048-5050-2015-S0 from framework 20150914-234602-169978048-5050-2015-0000
Sep 15 00:46:55 mesos-master-01 mesos-master[2015]: I0915 00:46:55.890699  2074 master.cpp:4690] Removing task task.journalnode.journalnode.NodeExecutor.1442277857378 with resources cpus(*):1; mem(*):512 of framework 20150914-234602-169978048-5050-2015-0000 on slave 20150914-234602-169978048-5050-2015-S0 at slave(1)@192.168.33.10:5051 (mesos-master-01)
Sep 15 00:46:55 mesos-master-01 mesos-master[2015]: I0915 00:46:55.890844  2074 master.cpp:2787] Forwarding status update acknowledgement 826824c1-f1f0-4a8d-a07f-e841c3b402d5 for task task.journalnode.journalnode.NodeExecutor.1442277857378 of framework 20150914-234602-169978048-5050-2015-0000 (hdfs) at scheduler-32bcebe3-ec7c-4c6f-bed8-5a06a332624a@192.168.33.11:40392 to slave 20150914-234602-169978048-5050-2015-S0 at slave(1)@192.168.33.10:5051 (mesos-master-01)
Sep 15 00:46:55 mesos-master-01 mesos-slave[2019]: I0915 00:46:55.891283  2073 status_update_manager.cpp:389] Received status update acknowledgement (UUID: 826824c1-f1f0-4a8d-a07f-e841c3b402d5) for task task.journalnode.journalnode.NodeExecutor.1442277857378 of framework 20150914-234602-169978048-5050-2015-0000
Sep 15 00:46:55 mesos-master-01 mesos-slave[2019]: I0915 00:46:55.891348  2073 status_update_manager.hpp:346] Checkpointing ACK for status update TASK_LOST (UUID: 826824c1-f1f0-4a8d-a07f-e841c3b402d5) for task task.journalnode.journalnode.NodeExecutor.1442277857378 of framework 20150914-234602-169978048-5050-2015-0000

What's the best way to debug this over IRC? My timezone is AEST (Australia).

elingg commented 8 years ago

Ah, timezones are tricky, but we can set up a time if needed. You can always check with other Mesos users on IRC as well as the mesos user mailing lists to see if you get a response in your timezone. At the same time, we can proceed through github issues. This seems to be a Mesos issue still with the containerizer. See the similar Marathon issue with unknown container, https://github.com/mesosphere/marathon/issues/734. Also, this is a custom executor. Does the launch of the executor exceed the default mesos-slave executor registration timeout? What do you see being downloaded in the executor sandbox?

F21 commented 8 years ago

I am currently removing marathon from the equation and am launching hdfs-mesos manually via the commandline on one of the nodes.

In terms of the download, I can see the executor and java 7 being downloaded from downloads.mesosphere.io and extracted into the sandbox.

The executor_registration_timeout has always been set to a pretty high value, so it shouldn't be a problem.

What is strange is that I got the hdfs framework running back in June or July without too many problems. I will go and build 0.1.2 and see if it works.

elingg commented 8 years ago

If it's a Mesos containerizer issue, downgrading the HDFS-Mesos version probably won't help much, but you can try to help diagnose. Running without Marathon might if you are running marathon in docker or other apps in docker, because the recent containizer issues we've seen occur when the user runs docker containers alongside mesos containers. Since HDFS-Mesos doesn't require Docker, one thing to try is to remove docker from the slave containerizer flags.

F21 commented 8 years ago

I did have docker in my previous set up, but it's a good idea to setup a very minimal mesos cluster with just mesos, mesos-dns and hdfs-mesos to replicate. I think I will also not install marathon as well. All of these are installed from the Mesosphere apt repos.

I'll give that a go and see if it makes a difference.

F21 commented 8 years ago

Just reporting back.

I am still seeing the journal node task being lost.

My cluster is running Ubuntu 14.04 64-bit with mesos dns and the head of HDFS-mesos. Docker and marathon was not installed.

elingg commented 8 years ago

Very strange, not sure why you are seeing containerizer issues then. Still the same container error messages? Sep 15 00:46:55 mesos-master-01 mesos-slave[2019]: W0915 00:46:55.863901 2068 containerizer.cpp:814] Ignoring update for unknown container: df865edb-c019-4f1a-8a70-1cee56a4f792One thought after double checking your settings is that your memory settings are very low.

mesos.hdfs.hadoop.heap.size
<value>256</value>

Perhaps the JN task requires more memory by default to even launch. HDFS does have certain minimums. Could you possibly try bumping up your memory settings? Would you also consider launching on the environments I mentioned above (DCOS, GCE, Vagrant)? That might give us an idea of what is different. You can also certainly try with an older release, but I'm not sure that will help, especially since I'm unable to reproduce your issue. Again, thanks for your investigation!

F21 commented 8 years ago

@elingg I made some progress! Got the JN to launch on my stripped own setup with the HEAD of hdfs-mesos. I noticed that I neglected to set executor_registration_timeout for it.

Currently, the clusters are launched using vagrant.

Since mesos 0.24 was just released, I am going to build the binaries and setup a CoreOS cluster again to see if it resolves the problem.

elingg commented 8 years ago

@F21, that's great news! executor_registration_timeout is a common issue (which is why I mentioned it above) as the container will be destroyed if the executor is not created in time.

That's awesome that Ubuntu is working. Core OS should be fine, but remember the comment I made about the sym link permissions issue. To get around that, we use the predistributed binaries option.

F21 commented 8 years ago

The strange thing is that I always had the MESOS_EXECUTOR_REGISTRATION_TIMEOUT environment variable set on all my CoreOS slaves, so it really shouldn't be the cause on CoreOS. I will try adding that to --executor_environment_variables to see if can get it going on CoreOS.

elingg commented 8 years ago

Is this safe to close @F21 with the MESOS_EXECUTOR_REGISTRATION_TIMEOUT environment variable fix?

F21 commented 8 years ago

@elingg Unfortunately, that didn't make any difference on CoreOS. I am still investigating to see why that's happening.

elingg commented 8 years ago

Understood, let us know if you have questions @F21 and as a product plug, with DCOS, installation on Core OS is as easy as one command :+1:

F21 commented 8 years ago

@elingg Is there anyway to increase the verbosity of the executor?

Sep 16 22:22:28 core-03 mesos-slave[829]: I0916 22:22:28.686725   841 slave.cpp:1249] Got assigned task task.journalnode.journalnode.NodeExecutor.1442442148388 for framework 20150916-220225-169978048-5050-600-0001
Sep 16 22:22:28 core-03 mesos-slave[829]: I0916 22:22:28.689013   841 slave.cpp:1365] Launching task task.journalnode.journalnode.NodeExecutor.1442442148388 for framework 20150916-220225-169978048-5050-600-0001
Sep 16 22:22:28 core-03 mesos-slave[829]: I0916 22:22:28.722630   841 slave.cpp:4799] Launching executor executor.journalnode.NodeExecutor.1442442148388 of framework 20150916-220225-169978048-5050-600-0001 with resources cpus(*):0.5; mem(*):256 in work directory '/tmp/mesos/slaves/20150916-220225-169978048-5050-600-S2/frameworks/20150916-220225-169978048-5050-600-0001/executors/executor.journalnode.NodeExecutor.1442442148388/runs/46f63025-41f1-4bf0-b634-9e46a2b8770e'
Sep 16 22:22:28 core-03 mesos-slave[829]: I0916 22:22:28.725831   846 docker.cpp:739] No container info found, skipping launch
Sep 16 22:22:28 core-03 mesos-slave[829]: I0916 22:22:28.727078   841 slave.cpp:1583] Queuing task 'task.journalnode.journalnode.NodeExecutor.1442442148388' for executor executor.journalnode.NodeExecutor.1442442148388 of framework '20150916-220225-169978048-5050-600-0001
Sep 16 22:22:28 core-03 mesos-slave[829]: I0916 22:22:28.732386   844 containerizer.cpp:633] Starting container '46f63025-41f1-4bf0-b634-9e46a2b8770e' for executor 'executor.journalnode.NodeExecutor.1442442148388' of framework '20150916-220225-169978048-5050-600-0001'
Sep 16 22:22:28 core-03 mesos-slave[829]: I0916 22:22:28.740202   844 launcher.cpp:131] Forked child with pid '1477' for container '46f63025-41f1-4bf0-b634-9e46a2b8770e'
Sep 16 22:22:28 core-03 mesos-slave[829]: I0916 22:22:28.741034   844 containerizer.cpp:855] Checkpointing executor's forked pid 1477 to '/tmp/mesos/meta/slaves/20150916-220225-169978048-5050-600-S2/frameworks/20150916-220225-169978048-5050-600-0001/executors/executor.journalnode.NodeExecutor.1442442148388/runs/46f63025-41f1-4bf0-b634-9e46a2b8770e/pids/forked.pid'
Sep 16 22:22:28 core-03 mesos-slave[829]: I0916 22:22:28.783072   843 containerizer.cpp:1266] Executor for container '46f63025-41f1-4bf0-b634-9e46a2b8770e' has exited
Sep 16 22:22:28 core-03 mesos-slave[829]: I0916 22:22:28.785030   843 containerizer.cpp:1079] Destroying container '46f63025-41f1-4bf0-b634-9e46a2b8770e'
Sep 16 22:22:32 core-03 mesos-slave[829]: 2015-09-16 22:22:32,621:829(0x7faf52c83700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 159ms
Sep 16 22:22:35 core-03 mesos-slave[829]: 2015-09-16 22:22:35,977:829(0x7faf52c83700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 22ms
Sep 16 22:22:39 core-03 mesos-slave[829]: 2015-09-16 22:22:39,322:829(0x7faf52c83700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 13ms
Sep 16 22:22:41 core-03 mesos-slave[829]: I0916 22:22:41.547842   844 http.cpp:174] HTTP GET for /slave(1)/state.json from 192.168.33.1:59577 with User-Agent='Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.93 Safari/537.36'
Sep 16 22:22:42 core-03 mesos-slave[829]: 2015-09-16 22:22:42,708:829(0x7faf52c83700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 51ms
Sep 16 22:22:45 core-03 mesos-slave[829]: I0916 22:22:45.218466   843 slave.cpp:3885] Current disk usage 11.00%. Max allowed age: 5.530020488612940days
Sep 16 22:23:45 core-03 mesos-slave[829]: I0916 22:23:45.228628   845 slave.cpp:3885] Current disk usage 12.02%. Max allowed age: 5.458615492878796days
Sep 16 22:24:12 core-03 mesos-slave[829]: 2015-09-16 22:24:12,989:829(0x7faf52c83700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 166ms
Sep 16 22:24:23 core-03 mesos-slave[829]: 2015-09-16 22:24:23,056:829(0x7faf52c83700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 52ms
Sep 16 22:24:45 core-03 mesos-slave[829]: I0916 22:24:45.232887   846 slave.cpp:3885] Current disk usage 13.88%. Max allowed age: 5.328164058364514days
Sep 16 22:24:49 core-03 mesos-slave[829]: 2015-09-16 22:24:49,816:829(0x7faf52c83700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 56ms
Sep 16 22:25:06 core-03 mesos-slave[829]: 2015-09-16 22:25:06,997:829(0x7faf52c83700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 495ms
Sep 16 22:25:20 core-03 mesos-slave[829]: E0916 22:25:20.980870   845 slave.cpp:3301] Container '46f63025-41f1-4bf0-b634-9e46a2b8770e' for executor 'executor.journalnode.NodeExecutor.1442442148388' of framework '20150916-220225-169978048-5050-600-0001' failed to start: Container destroyed during launch
Sep 16 22:25:20 core-03 mesos-slave[829]: W0916 22:25:20.981760   845 containerizer.cpp:1068] Ignoring destroy of unknown container: 46f63025-41f1-4bf0-b634-9e46a2b8770e
Sep 16 22:25:20 core-03 mesos-slave[829]: E0916 22:25:20.981994   841 slave.cpp:3383] Termination of executor 'executor.journalnode.NodeExecutor.1442442148388' of framework '20150916-220225-169978048-5050-600-0001' failed: Unknown container: 46f63025-41f1-4bf0-b634-9e46a2b8770e
Sep 16 22:25:20 core-03 mesos-slave[829]: I0916 22:25:20.983971   841 slave.cpp:2696] Handling status update TASK_LOST (UUID: c7e443a4-886c-4c97-af71-951247001a79) for task task.journalnode.journalnode.NodeExecutor.1442442148388 of framework 20150916-220225-169978048-5050-600-0001 from @0.0.0.0:0
Sep 16 22:25:20 core-03 mesos-slave[829]: W0916 22:25:20.985064   844 containerizer.cpp:970] Ignoring update for unknown container: 46f63025-41f1-4bf0-b634-9e46a2b8770e
Sep 16 22:25:20 core-03 mesos-slave[829]: I0916 22:25:20.985843   845 status_update_manager.cpp:322] Received status update TASK_LOST (UUID: c7e443a4-886c-4c97-af71-951247001a79) for task task.journalnode.journalnode.NodeExecutor.1442442148388 of framework 20150916-220225-169978048-5050-600-0001
Sep 16 22:25:20 core-03 mesos-slave[829]: I0916 22:25:20.986872   845 status_update_manager.cpp:826] Checkpointing UPDATE for status update TASK_LOST (UUID: c7e443a4-886c-4c97-af71-951247001a79) for task task.journalnode.journalnode.NodeExecutor.1442442148388 of framework 20150916-220225-169978048-5050-600-0001
Sep 16 22:25:20 core-03 mesos-slave[829]: I0916 22:25:20.987951   844 slave.cpp:2975] Forwarding the update TASK_LOST (UUID: c7e443a4-886c-4c97-af71-951247001a79) for task task.journalnode.journalnode.NodeExecutor.1442442148388 of framework 20150916-220225-169978048-5050-600-0001 to master@192.168.33.10:5050
Sep 16 22:25:21 core-03 mesos-slave[829]: I0916 22:25:21.061379   840 status_update_manager.cpp:394] Received status update acknowledgement (UUID: c7e443a4-886c-4c97-af71-951247001a79) for task task.journalnode.journalnode.NodeExecutor.1442442148388 of framework 20150916-220225-169978048-5050-600-0001
Sep 16 22:25:21 core-03 mesos-slave[829]: I0916 22:25:21.061650   840 status_update_manager.cpp:826] Checkpointing ACK for status update TASK_LOST (UUID: c7e443a4-886c-4c97-af71-951247001a79) for task task.journalnode.journalnode.NodeExecutor.1442442148388 of framework 20150916-220225-169978048-5050-600-0001
Sep 16 22:25:21 core-03 mesos-slave[829]: I0916 22:25:21.062326   840 slave.cpp:3503] Cleaning up executor 'executor.journalnode.NodeExecutor.1442442148388' of framework 20150916-220225-169978048-5050-600-0001

Is the executor meant to exit? I think this item in the mesos-slave log looks a bit suspicious:

Sep 16 22:22:28 core-03 mesos-slave[829]: I0916 22:22:28.783072   843 containerizer.cpp:1266] Executor for container '46f63025-41f1-4bf0-b634-9e46a2b8770e' has exited

Interestingly, mesos tries to relaunch the executor on the same node a few times and then the mesos-slave crashes and goes offline:

Sep 16 22:30:45 core-03 mesos-slave[829]: I0916 22:30:45.369068   842 slave.cpp:3885] Current disk usage 88.59%. Max allowed age: 2.363740759884445hrs
Sep 16 22:31:09 core-03 mesos-slave[829]: E0916 22:31:09.222362   843 fetcher.cpp:515] Failed to run mesos-fetcher: Failed to fetch all URIs for container '4788225d-fb42-44fe-b44d-d8bd67f144c9' with exit status: 256
Sep 16 22:31:09 core-03 mesos-slave[829]: E0916 22:31:09.223407   843 slave.cpp:3301] Container '4788225d-fb42-44fe-b44d-d8bd67f144c9' for executor 'executor.journalnode.NodeExecutor.1442442573784' of framework '20150916-220225-169978048-5050-600-0001' failed to start: Failed to fetch all URIs for container '4788225d-fb42-44fe-b44d-d8bd67f144c9' with exit status: 256
Sep 16 22:31:09 core-03 mesos-slave[829]: W0916 22:31:09.223945   843 containerizer.cpp:1068] Ignoring destroy of unknown container: 4788225d-fb42-44fe-b44d-d8bd67f144c9
Sep 16 22:31:09 core-03 mesos-slave[829]: E0916 22:31:09.224094   842 slave.cpp:3383] Termination of executor 'executor.journalnode.NodeExecutor.1442442573784' of framework '20150916-220225-169978048-5050-600-0001' failed: Unknown container: 4788225d-fb42-44fe-b44d-d8bd67f144c9
Sep 16 22:31:09 core-03 mesos-slave[829]: I0916 22:31:09.227615   842 slave.cpp:2696] Handling status update TASK_LOST (UUID: f5f8ce65-88c5-41f0-93ab-41e82022efc5) for task task.journalnode.journalnode.NodeExecutor.1442442573784 of framework 20150916-220225-169978048-5050-600-0001 from @0.0.0.0:0
Sep 16 22:31:09 core-03 mesos-slave[829]: W0916 22:31:09.228691   847 containerizer.cpp:970] Ignoring update for unknown container: 4788225d-fb42-44fe-b44d-d8bd67f144c9
Sep 16 22:31:09 core-03 mesos-slave[829]: I0916 22:31:09.230973   842 status_update_manager.cpp:322] Received status update TASK_LOST (UUID: f5f8ce65-88c5-41f0-93ab-41e82022efc5) for task task.journalnode.journalnode.NodeExecutor.1442442573784 of framework 20150916-220225-169978048-5050-600-0001
Sep 16 22:31:09 core-03 mesos-slave[829]: I0916 22:31:09.231657   842 status_update_manager.cpp:826] Checkpointing UPDATE for status update TASK_LOST (UUID: f5f8ce65-88c5-41f0-93ab-41e82022efc5) for task task.journalnode.journalnode.NodeExecutor.1442442573784 of framework 20150916-220225-169978048-5050-600-0001
Sep 16 22:31:09 core-03 mesos-slave[829]: F0916 22:31:09.232223   842 slave.cpp:2897] CHECK_READY(future): is FAILED: Failed to write status update TASK_LOST (UUID: f5f8ce65-88c5-41f0-93ab-41e82022efc5) for task task.journalnode.journalnode.NodeExecutor.1442442573784 of framework 20150916-220225-169978048-5050-600-0001 to '/tmp/mesos/meta/slaves/20150916-220225-169978048-5050-600-S2/frameworks/20150916-220225-169978048-5050-600-0001/executors/executor.journalnode.NodeExecutor.1442442573784/runs/4788225d-fb42-44fe-b44d-d8bd67f144c9/tasks/task.journalnode.journalnode.NodeExecutor.1442442573784/task.updates': Failed to write size: No space left on device Failed to handle status update TASK_LOST (UUID: f5f8ce65-88c5-41f0-93ab-41e82022efc5) for task task.journalnode.journalnode.NodeExecutor.1442442573784 of framework 20150916-220225-169978048-5050-600-0001
Sep 16 22:31:09 core-03 mesos-slave[829]: *** Check failure stack trace: ***
Sep 16 22:31:09 core-03 mesos-slave[829]: @     0x7faf5e151ac0  google::LogMessage::Fail()
Sep 16 22:31:09 core-03 mesos-slave[829]: @     0x7faf5e151a0c  google::LogMessage::SendToLog()
Sep 16 22:31:09 core-03 mesos-slave[829]: @     0x7faf5e15140e  google::LogMessage::Flush()
Sep 16 22:31:09 core-03 mesos-slave[829]: @     0x7faf5e154322  google::LogMessageFatal::~LogMessageFatal()
Sep 16 22:31:09 core-03 mesos-slave[829]: @     0x7faf5d3ff2d8  _CheckFatal::~_CheckFatal()
Sep 16 22:31:09 core-03 mesos-slave[829]: @     0x7faf5d865c24  mesos::internal::slave::Slave::__statusUpdate()
Sep 16 22:31:09 core-03 mesos-slave[829]: @     0x7faf5d89e452  _ZZN7process8dispatchIN5mesos8internal5slave5SlaveERKNS_6FutureI7NothingEERKNS2_12StatusUpdateERKNS_4UPIDES7_SA_SD_EEvRKNS_3PIDIT_EEMSH_FvT0_T1_T2_ET3_T4_T5_ENKUlPNS_11ProcessBaseEE_clESU_
Sep 16 22:31:09 core-03 mesos-slave[829]: @     0x7faf5d8d1dd3  _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal5slave5SlaveERKNS0_6FutureI7NothingEERKNS6_12StatusUpdateERKNS0_4UPIDESB_SE_SH_EEvRKNS0_3PIDIT_EEMSL_FvT0_T1_T2_ET3_T4_T5_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
Sep 16 22:31:09 core-03 mesos-slave[829]: @     0x7faf5e0dbcf1  std::function<>::operator()()
Sep 16 22:31:09 core-03 mesos-slave[829]: @     0x7faf5e0c5545  process::ProcessBase::visit()
Sep 16 22:31:09 core-03 mesos-slave[829]: @     0x7faf5e0c8398  process::DispatchEvent::visit()
Sep 16 22:31:09 core-03 mesos-slave[829]: @     0x7faf5d3de37a  process::ProcessBase::serve()
Sep 16 22:31:09 core-03 mesos-slave[829]: @     0x7faf5e0c1a4c  process::ProcessManager::resume()
Sep 16 22:31:10 core-03 mesos-slave[829]: @     0x7faf5e0b5f58  process::internal::schedule()
Sep 16 22:31:10 core-03 mesos-slave[829]: @     0x7faf5e113be7  _ZNSt12_Bind_simpleIFPFvvEvEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE
Sep 16 22:31:10 core-03 mesos-slave[829]: @     0x7faf5e113b41  std::_Bind_simple<>::operator()()
Sep 16 22:31:10 core-03 mesos-slave[829]: @     0x7faf5e113ada  std::thread::_Impl<>::_M_run()
Sep 16 22:31:10 core-03 mesos-slave[829]: @     0x7faf5a1c1d73  (unknown)
Sep 16 22:31:10 core-03 mesos-slave[829]: @     0x7faf5a42566c  (unknown)
Sep 16 22:31:10 core-03 mesos-slave[829]: @     0x7faf599202ed  (unknown)
elingg commented 8 years ago

Yes, that is a suspicious containerizer issue. It points to a possible Mesos containerizer issue related to systemd or not giving the executor enough time to register (i.e. executor registration time out).

F21 commented 8 years ago

@elingg I think I might be one step closer to finding the problem.

I noticed that the hdfs-site.xml file downloaded to the executor's sandbox have some issues:

<property>
        <name>dfs.namenode.rpc-address.hdfs.nn1</name>
        <value>:50071</value>
    </property>

    <property>
        <name>dfs.namenode.http-address.hdfs.nn1</name>
        <value>:50070</value>
    </property>

It looks like ${nn1Hostname} is an empty value and gets expanded to :50070

How is the value of nn1Hostname determined? My mesos-slaves are launched with --hostname=${COREOS_PRIVATE_IPV4} --ip=${COREOS_PRIVATE_IPV4} so that everything is done with ip addresses and I don't have to add the hostnames of all my nodes to my /etc/hosts.

elingg commented 8 years ago

I don't think that's the issue as the NN's have not yet launched so that would be expected to be blank until they launch. You should actually check for the *-site.xml files in hdfs-mesos-executor-0.1.4/etc/hadoop folder in your sandbox

F21 commented 8 years ago

Within the hdfs-mesos-executor-0.1.4/etc/hadoop/* folder, things look fine.

mesos-site.xml:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <property>
    <name>mesos.hdfs.data.dir</name>
    <description>The primary data directory in HDFS</description>
    <value>/var/lib/hdfs/data</value>
  </property>

  <property>
    <name>mesos.hdfs.secondary.data.dir</name>
    <description>The secondary data directory in HDFS</description>
    <value>/var/run/hadoop-hdfs</value>
  </property>

  <property>
    <name>mesos.hdfs.native-hadoop-binaries</name>
    <description>Mark true if you have hadoop pre-installed on your host machines (otherwise it will be distributed by the scheduler)</description>
    <value>false</value>
  </property>

  <property>
    <name>mesos.hdfs.framework.mnt.path</name>
    <description>Mount location (if mesos.hdfs.native-hadoop-binaries is marked false)</description>
    <value>/opt/mesosphere</value>
  </property>

  <property>
    <name>mesos.hdfs.state.zk</name>
    <description>Comma-separated hostname-port pairs of zookeeper node locations for HDFS framework state information</description>
    <value>master.mesos:2181</value>
  </property>

  <property>
    <name>mesos.master.uri</name>
    <description>Zookeeper entry for mesos master location</description>
    <value>zk://master.mesos:2181/mesos</value>
  </property>

  <property>
    <name>mesos.hdfs.zkfc.ha.zookeeper.quorum</name>
    <description>Comma-separated list of zookeeper hostname-port pairs for HDFS HA features</description>
    <value>master.mesos:2181</value>
  </property>

  <property>
    <name>mesos.hdfs.framework.name</name>
    <description>Your Mesos framework name and cluster name when accessing files (hdfs://YOUR_NAME)</description>
    <value>hdfs</value>
  </property>

  <property>
    <name>mesos.hdfs.mesosdns</name>
    <description>Whether to use Mesos DNS for service discovery within HDFS</description>
    <value>true</value>
  </property>

  <property>
    <name>mesos.hdfs.mesosdns.domain</name>
    <description>Root domain name of Mesos DNS (usually 'mesos')</description>
    <value>mesos</value>
  </property>

  <property>
    <name>mesos.native.library</name>
    <description>Location of libmesos.so</description>
    <value>/opt/test/packages/mesos/lib/libmesos.so</value>
  </property>

  <property>
    <name>mesos.hdfs.journalnode.count</name>
    <description>Number of journal nodes (must be odd)</description>
    <value>1</value>
  </property>

  <!-- Additional settings for fine-tuning -->
  <property>
    <name>mesos.hdfs.jvm.overhead</name>
    <description>Multiplier on resources reserved in order to account for JVM allocation</description>
    <value>1</value>
  </property>

  <property>
    <name>mesos.hdfs.hadoop.heap.size</name>
    <value>512</value>
  </property>

  <property>
    <name>mesos.hdfs.namenode.heap.size</name>
    <value>512</value>
  </property>

  <property>
    <name>mesos.hdfs.datanode.heap.size</name>
    <value>512</value>
  </property>

  <property>
    <name>mesos.hdfs.executor.heap.size</name>
    <value>256</value>
  </property>

  <property>
    <name>mesos.hdfs.executor.cpus</name>
    <value>0.5</value>
  </property>

  <property>
    <name>mesos.hdfs.namenode.cpus</name>
    <value>0.5</value>
  </property>

  <property>
    <name>mesos.hdfs.journalnode.cpus</name>
    <value>0.5</value>
  </property>

  <property>
    <name>mesos.hdfs.datanode.cpus</name>
    <value>0.5</value>
  </property>

  <property>
    <name>mesos.hdfs.user</name>
    <value>root</value>
  </property>

  <property>
    <name>mesos.hdfs.role</name>
    <value>*</value>
  </property>
</configuration>

hdfs-site.xml:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>dfs.ha.automatic-failover.enabled</name>
        <value>true</value>
    </property>

    <property>
        <name>dfs.nameservice.id</name>
        <value>${frameworkName}</value>
    </property>

    <property>
        <name>dfs.nameservices</name>
        <value>${frameworkName}</value>
    </property>

    <property>
        <name>dfs.ha.namenodes.${frameworkName}</name>
        <value>nn1</value>
    </property>

    <property>
        <name>dfs.namenode.rpc-address.${frameworkName}.nn1</name>
        <value>${nn1Hostname}:50071</value>
    </property>

    <property>
        <name>dfs.namenode.http-address.${frameworkName}.nn1</name>
        <value>${nn1Hostname}:50070</value>
    </property>

    <property>
        <name>dfs.client.failover.proxy.provider.${frameworkName}</name>
        <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <property>
        <name>dfs.namenode.shared.edits.dir</name>
        <value>qjournal://${journalnodes}/${frameworkName}</value>
    </property>

    <property>
        <name>ha.zookeeper.quorum</name>
        <value>${haZookeeperQuorum}</value>
    </property>

    <property>
        <name>dfs.journalnode.edits.dir</name>
        <value>${dataDir}/jn</value>
    </property>

    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file://${dataDir}/name</value>
    </property>

    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file://${dataDir}/data</value>
    </property>

    <property>
        <name>dfs.ha.fencing.methods</name>
        <value>shell(/bin/true)</value>
    </property>

    <property>
        <name>dfs.permissions</name>
        <value>false</value>
    </property>

    <property>
        <name>dfs.datanode.du.reserved</name>
        <value>10485760</value>
    </property>

    <property>
        <name>dfs.datanode.balance.bandwidthPerSec</name>
        <value>41943040</value>
    </property>

    <property>
        <name>dfs.namenode.safemode.threshold-pct</name>
        <value>0.90</value>
    </property>

    <property>
        <name>dfs.namenode.heartbeat.recheck-interval</name>
        <!-- 60 seconds -->
        <value>60000</value>
    </property>

    <property>
        <name>dfs.datanode.handler.count</name>
        <value>10</value>
    </property>

    <property>
        <name>dfs.namenode.handler.count</name>
        <value>20</value>
    </property>

    <property>
        <name>dfs.image.compress</name>
        <value>true</value>
    </property>

    <property>
        <name>dfs.image.compression.codec</name>
        <value>org.apache.hadoop.io.compress.SnappyCodec</value>
    </property>

    <property>
        <name>dfs.namenode.invalidate.work.pct.per.iteration</name>
        <value>0.35f</value>
    </property>

    <property>
        <name>dfs.namenode.replication.work.multiplier.per.iteration</name>
        <value>4</value>
    </property>

    <!-- This property allows us to use IP's directly for communication instead of hostnames. -->
    <property>
        <name>dfs.namenode.datanode.registration.ip-hostname-check</name>
        <value>false</value>
    </property>

    <property>
        <name>dfs.client.read.shortcircuit</name>
        <value>true</value>
    </property>

    <property>
        <name>dfs.client.read.shortcircuit.streams.cache.size</name>
        <value>1000</value>
    </property>

    <property>
        <name>dfs.client.read.shortcircuit.streams.cache.size.expiry.ms</name>
        <value>1000</value>
    </property>

    <!-- This property needs to be consistent with mesos.hdfs.secondary.data.dir -->
    <property>
        <name>dfs.domain.socket.path</name>
        <value>/var/run/hadoop-hdfs/dn._PORT</value>
    </property>
</configuration>

I tried running ./bin/hdfs-mesos-journalnode, but it wasn't able to expand the variables in the config files:

15/09/16 23:38:20 INFO server.JournalNode: registered UNIX signal handlers for [TERM, HUP, INT]
Exception in thread "main" java.lang.IllegalArgumentException: Journal dir '${dataDir}/jn' should be an absolute path
        at org.apache.hadoop.hdfs.qjournal.server.JournalNode.validateAndCreateJournalDir(JournalNode.java:112)
        at org.apache.hadoop.hdfs.qjournal.server.JournalNode.start(JournalNode.java:136)
        at org.apache.hadoop.hdfs.qjournal.server.JournalNode.run(JournalNode.java:126)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
        at org.apache.hadoop.hdfs.qjournal.server.JournalNode.main(JournalNode.java:295)

Is there anyway to turn up the verbosity of the executor so that I can get some output in stderr and stdout? Or perhaps a way to run the executor for the journal node manually from the commandline to see if there's an error.

F21 commented 8 years ago

Copied the expanded hdfs-site.xml to etc/haddop/hdfs-site.xml and launched hdfs-mesos-journalnode and it appears to launch properly:

STARTUP_MSG:   build = http://github.com/cloudera/hadoop -r 4cda8416c73034b59cc8baafbe3666b074472846; compiled by 'jenkins' on 2015-01-28T00:37Z
STARTUP_MSG:   java = 1.7.0_76
************************************************************/
15/09/17 00:02:26 INFO server.JournalNode: registered UNIX signal handlers for [TERM, HUP, INT]
15/09/17 00:02:26 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
15/09/17 00:02:27 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
15/09/17 00:02:27 INFO impl.MetricsSystemImpl: JournalNode metrics system started
15/09/17 00:02:29 INFO hdfs.DFSUtil: Starting web server as: null
15/09/17 00:02:29 INFO hdfs.DFSUtil: Starting Web-server for journal at: http://0.0.0.0:8480
15/09/17 00:02:29 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
15/09/17 00:02:29 INFO http.HttpRequestLog: Http request log for http.requests.journal is not defined
15/09/17 00:02:29 INFO http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
15/09/17 00:02:29 INFO http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context journal
15/09/17 00:02:29 INFO http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static
15/09/17 00:02:29 INFO http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs
15/09/17 00:02:29 INFO http.HttpServer2: Jetty bound to port 8480
15/09/17 00:02:29 INFO mortbay.log: jetty-6.1.26.cloudera.4
15/09/17 00:02:30 INFO mortbay.log: Started HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:8480
15/09/17 00:02:32 INFO ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue
15/09/17 00:02:32 INFO ipc.Server: Starting Socket Reader #1 for port 8485
15/09/17 00:02:32 INFO ipc.Server: IPC Server Responder: starting
15/09/17 00:02:32 INFO ipc.Server: IPC Server listener on 8485: starting
elingg commented 8 years ago

My best guess based on your findings would be a containerizer issue on coreos or executor registration timeout issue. Might be best to check with the Mesos core team.

F21 commented 8 years ago

@elingg I have made some progress!

Current stack is CoreOS 808.0.0 and Mesos 0.24.0 and marathon 0.10.0.

I compiled mesos by setting prefix to /opt/test/mesos and deployed mesos to /opt/test/mesos. In addition, LD_LIBRARY_PATH is set to /opt/test/mesos/lib.

I am now able to get some useful information out of stderr for the journal node executor.

The current mesos-site.xml I am compiling with mesos-hdfs with is:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <property>
    <name>mesos.hdfs.data.dir</name>
    <description>The primary data directory in HDFS</description>
    <value>/var/lib/hdfs/data</value>
  </property>

  <property>
    <name>mesos.hdfs.secondary.data.dir</name>
    <description>The secondary data directory in HDFS</description>
    <value>/var/run/hadoop-hdfs</value>
  </property>

  <property>
    <name>mesos.hdfs.native-hadoop-binaries</name>
    <description>Mark true if you have hadoop pre-installed on your host machines (otherwise it will be distributed by the scheduler)</description>
    <value>false</value>
  </property>

  <property>
    <name>mesos.hdfs.framework.mnt.path</name>
    <description>Mount location (if mesos.hdfs.native-hadoop-binaries is marked false)</description>
    <value>/opt/mesosphere</value>
  </property>

  <property>
    <name>mesos.hdfs.state.zk</name>
    <description>Comma-separated hostname-port pairs of zookeeper node locations for HDFS framework state information</description>
    <value>master.mesos:2181</value>
  </property>

  <property>
    <name>mesos.master.uri</name>
    <description>Zookeeper entry for mesos master location</description>
    <value>zk://master.mesos:2181/mesos</value>
  </property>

  <property>
    <name>mesos.hdfs.zkfc.ha.zookeeper.quorum</name>
    <description>Comma-separated list of zookeeper hostname-port pairs for HDFS HA features</description>
    <value>master.mesos:2181</value>
  </property>

  <property>
    <name>mesos.hdfs.framework.name</name>
    <description>Your Mesos framework name and cluster name when accessing files (hdfs://YOUR_NAME)</description>
    <value>hdfs</value>
  </property>

  <property>
    <name>mesos.hdfs.mesosdns</name>
    <description>Whether to use Mesos DNS for service discovery within HDFS</description>
    <value>true</value>
  </property>

  <property>
    <name>mesos.hdfs.mesosdns.domain</name>
    <description>Root domain name of Mesos DNS (usually 'mesos')</description>
    <value>mesos</value>
  </property>

  <property>
    <name>mesos.native.library</name>
    <description>Location of libmesos.so</description>
    <value>/opt/test/mesos/lib/libmesos.so</value>
  </property>

  <property>
    <name>mesos.hdfs.journalnode.count</name>
    <description>Number of journal nodes (must be odd)</description>
    <value>1</value>
  </property>

  <!-- Additional settings for fine-tuning -->
  <property>
    <name>mesos.hdfs.jvm.overhead</name>
    <description>Multiplier on resources reserved in order to account for JVM allocation</description>
    <value>1</value>
  </property>

  <property>
    <name>mesos.hdfs.hadoop.heap.size</name>
    <value>512</value>
  </property>

  <property>
    <name>mesos.hdfs.namenode.heap.size</name>
    <value>512</value>
  </property>

  <property>
    <name>mesos.hdfs.datanode.heap.size</name>
    <value>512</value>
  </property>

  <property>
    <name>mesos.hdfs.executor.heap.size</name>
    <value>256</value>
  </property>

  <property>
    <name>mesos.hdfs.executor.cpus</name>
    <value>0.5</value>
  </property>

  <property>
    <name>mesos.hdfs.namenode.cpus</name>
    <value>0.5</value>
  </property>

  <property>
    <name>mesos.hdfs.journalnode.cpus</name>
    <value>0.5</value>
  </property>

  <property>
    <name>mesos.hdfs.datanode.cpus</name>
    <value>0.5</value>
  </property>

  <property>
    <name>mesos.hdfs.user</name>
    <value>root</value>
  </property>

  <property>
    <name>mesos.hdfs.role</name>
    <value>*</value>
  </property>

  <property>
    <name>mesos.hdfs.ld-library-path</name>
    <value>/opt/test/mesos/lib</value>
  </property>
</configuration>

This is the output from the journal node executor's stderr and stdout:

I0923 06:26:57.411906  1385 logging.cpp:172] INFO level logging started!
I0923 06:26:57.412338  1385 fetcher.cpp:414] Fetcher Info: {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/20150923-054500-169978048-5050-1212-S3\/root","items":[{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"http:\/\/192.168.33.13:10000\/hdfs-mesos-executor-0.1.4.tgz"}},{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"http:\/\/192.168.33.13:10000\/hdfs-site.xml"}},{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"https:\/\/downloads.mesosphere.io\/java\/jre-7u76-linux-x64.tar.gz"}}],"sandbox_directory":"\/var\/lib\/mesos\/sandbox\/slaves\/20150923-054500-169978048-5050-1212-S3\/frameworks\/20150923-054500-169978048-5050-1212-0001\/executors\/executor.journalnode.NodeExecutor.1442989617044\/runs\/4d1ad220-b667-4d8a-b08a-83a48914e9d3","user":"root"}
I0923 06:26:57.418102  1385 fetcher.cpp:369] Fetching URI 'http://192.168.33.13:10000/hdfs-mesos-executor-0.1.4.tgz'
I0923 06:26:57.418159  1385 fetcher.cpp:243] Fetching directly into the sandbox directory
I0923 06:26:57.418212  1385 fetcher.cpp:180] Fetching URI 'http://192.168.33.13:10000/hdfs-mesos-executor-0.1.4.tgz'
I0923 06:26:57.418263  1385 fetcher.cpp:127] Downloading resource from 'http://192.168.33.13:10000/hdfs-mesos-executor-0.1.4.tgz' to '/var/lib/mesos/sandbox/slaves/20150923-054500-169978048-5050-1212-S3/frameworks/20150923-054500-169978048-5050-1212-0001/executors/executor.journalnode.NodeExecutor.1442989617044/runs/4d1ad220-b667-4d8a-b08a-83a48914e9d3/hdfs-mesos-executor-0.1.4.tgz'
I0923 06:26:57.867511  1385 fetcher.cpp:76] Extracting with command: tar -C '/var/lib/mesos/sandbox/slaves/20150923-054500-169978048-5050-1212-S3/frameworks/20150923-054500-169978048-5050-1212-0001/executors/executor.journalnode.NodeExecutor.1442989617044/runs/4d1ad220-b667-4d8a-b08a-83a48914e9d3' -xf '/var/lib/mesos/sandbox/slaves/20150923-054500-169978048-5050-1212-S3/frameworks/20150923-054500-169978048-5050-1212-0001/executors/executor.journalnode.NodeExecutor.1442989617044/runs/4d1ad220-b667-4d8a-b08a-83a48914e9d3/hdfs-mesos-executor-0.1.4.tgz'
I0923 06:26:59.066606  1385 fetcher.cpp:84] Extracted '/var/lib/mesos/sandbox/slaves/20150923-054500-169978048-5050-1212-S3/frameworks/20150923-054500-169978048-5050-1212-0001/executors/executor.journalnode.NodeExecutor.1442989617044/runs/4d1ad220-b667-4d8a-b08a-83a48914e9d3/hdfs-mesos-executor-0.1.4.tgz' into '/var/lib/mesos/sandbox/slaves/20150923-054500-169978048-5050-1212-S3/frameworks/20150923-054500-169978048-5050-1212-0001/executors/executor.journalnode.NodeExecutor.1442989617044/runs/4d1ad220-b667-4d8a-b08a-83a48914e9d3'
W0923 06:26:59.066730  1385 fetcher.cpp:265] Copying instead of extracting resource from URI with 'extract' flag, because it does not seem to be an archive: http://192.168.33.13:10000/hdfs-mesos-executor-0.1.4.tgz
I0923 06:26:59.066947  1385 fetcher.cpp:446] Fetched 'http://192.168.33.13:10000/hdfs-mesos-executor-0.1.4.tgz' to '/var/lib/mesos/sandbox/slaves/20150923-054500-169978048-5050-1212-S3/frameworks/20150923-054500-169978048-5050-1212-0001/executors/executor.journalnode.NodeExecutor.1442989617044/runs/4d1ad220-b667-4d8a-b08a-83a48914e9d3/hdfs-mesos-executor-0.1.4.tgz'
I0923 06:26:59.066982  1385 fetcher.cpp:369] Fetching URI 'http://192.168.33.13:10000/hdfs-site.xml'
I0923 06:26:59.067008  1385 fetcher.cpp:243] Fetching directly into the sandbox directory
I0923 06:26:59.067049  1385 fetcher.cpp:180] Fetching URI 'http://192.168.33.13:10000/hdfs-site.xml'
I0923 06:26:59.067090  1385 fetcher.cpp:127] Downloading resource from 'http://192.168.33.13:10000/hdfs-site.xml' to '/var/lib/mesos/sandbox/slaves/20150923-054500-169978048-5050-1212-S3/frameworks/20150923-054500-169978048-5050-1212-0001/executors/executor.journalnode.NodeExecutor.1442989617044/runs/4d1ad220-b667-4d8a-b08a-83a48914e9d3/hdfs-site.xml'
W0923 06:26:59.134197  1385 fetcher.cpp:265] Copying instead of extracting resource from URI with 'extract' flag, because it does not seem to be an archive: http://192.168.33.13:10000/hdfs-site.xml
I0923 06:26:59.135251  1385 fetcher.cpp:446] Fetched 'http://192.168.33.13:10000/hdfs-site.xml' to '/var/lib/mesos/sandbox/slaves/20150923-054500-169978048-5050-1212-S3/frameworks/20150923-054500-169978048-5050-1212-0001/executors/executor.journalnode.NodeExecutor.1442989617044/runs/4d1ad220-b667-4d8a-b08a-83a48914e9d3/hdfs-site.xml'
I0923 06:26:59.135306  1385 fetcher.cpp:369] Fetching URI 'https://downloads.mesosphere.io/java/jre-7u76-linux-x64.tar.gz'
I0923 06:26:59.135496  1385 fetcher.cpp:243] Fetching directly into the sandbox directory
I0923 06:26:59.135550  1385 fetcher.cpp:180] Fetching URI 'https://downloads.mesosphere.io/java/jre-7u76-linux-x64.tar.gz'
I0923 06:26:59.135594  1385 fetcher.cpp:127] Downloading resource from 'https://downloads.mesosphere.io/java/jre-7u76-linux-x64.tar.gz' to '/var/lib/mesos/sandbox/slaves/20150923-054500-169978048-5050-1212-S3/frameworks/20150923-054500-169978048-5050-1212-0001/executors/executor.journalnode.NodeExecutor.1442989617044/runs/4d1ad220-b667-4d8a-b08a-83a48914e9d3/jre-7u76-linux-x64.tar.gz'
I0923 06:30:25.936226  1385 fetcher.cpp:76] Extracting with command: tar -C '/var/lib/mesos/sandbox/slaves/20150923-054500-169978048-5050-1212-S3/frameworks/20150923-054500-169978048-5050-1212-0001/executors/executor.journalnode.NodeExecutor.1442989617044/runs/4d1ad220-b667-4d8a-b08a-83a48914e9d3' -xf '/var/lib/mesos/sandbox/slaves/20150923-054500-169978048-5050-1212-S3/frameworks/20150923-054500-169978048-5050-1212-0001/executors/executor.journalnode.NodeExecutor.1442989617044/runs/4d1ad220-b667-4d8a-b08a-83a48914e9d3/jre-7u76-linux-x64.tar.gz'
I0923 06:30:27.865962  1385 fetcher.cpp:84] Extracted '/var/lib/mesos/sandbox/slaves/20150923-054500-169978048-5050-1212-S3/frameworks/20150923-054500-169978048-5050-1212-0001/executors/executor.journalnode.NodeExecutor.1442989617044/runs/4d1ad220-b667-4d8a-b08a-83a48914e9d3/jre-7u76-linux-x64.tar.gz' into '/var/lib/mesos/sandbox/slaves/20150923-054500-169978048-5050-1212-S3/frameworks/20150923-054500-169978048-5050-1212-0001/executors/executor.journalnode.NodeExecutor.1442989617044/runs/4d1ad220-b667-4d8a-b08a-83a48914e9d3'
W0923 06:30:27.866242  1385 fetcher.cpp:265] Copying instead of extracting resource from URI with 'extract' flag, because it does not seem to be an archive: https://downloads.mesosphere.io/java/jre-7u76-linux-x64.tar.gz
I0923 06:30:27.866333  1385 fetcher.cpp:446] Fetched 'https://downloads.mesosphere.io/java/jre-7u76-linux-x64.tar.gz' to '/var/lib/mesos/sandbox/slaves/20150923-054500-169978048-5050-1212-S3/frameworks/20150923-054500-169978048-5050-1212-0001/executors/executor.journalnode.NodeExecutor.1442989617044/runs/4d1ad220-b667-4d8a-b08a-83a48914e9d3/jre-7u76-linux-x64.tar.gz'
I0923 06:30:30.748337  1419 exec.cpp:133] Version: 0.24.0
I0923 06:30:30.763150  1439 exec.cpp:207] Executor registered on slave 20150923-054500-169978048-5050-1212-S3
unlink: cannot unlink '/opt/mesosphere/hdfs': No such file or directory

This is the output of stdout:

MESOS_DIRECTORY=/var/lib/mesos/sandbox/slaves/20150923-054500-169978048-5050-1212-S3/frameworks/20150923-054500-169978048-5050-1212-0001/executors/executor.journalnode.NodeExecutor.1442989617044/runs/4d1ad220-b667-4d8a-b08a-83a48914e9d3
LD_LIBRARY_PATH=/opt/test/mesos/lib
MESOS_EXECUTOR_ID=executor.journalnode.NodeExecutor.1442989617044
PATH=/opt/test/java/bin:/usr/sbin:/usr/bin
PWD=/var/lib/mesos/sandbox/slaves/20150923-054500-169978048-5050-1212-S3/frameworks/20150923-054500-169978048-5050-1212-0001/executors/executor.journalnode.NodeExecutor.1442989617044/runs/4d1ad220-b667-4d8a-b08a-83a48914e9d3
JAVA_HOME=/var/lib/mesos/sandbox/slaves/20150923-054500-169978048-5050-1212-S3/frameworks/20150923-054500-169978048-5050-1212-0001/executors/executor.journalnode.NodeExecutor.1442989617044/runs/4d1ad220-b667-4d8a-b08a-83a48914e9d3/jre1.7.0_76
MESOS_NATIVE_LIBRARY=/opt/test/mesos/lib/libmesos-0.24.0.so
MESOS_SLAVE_PID=slave(1)@192.168.33.13:5051
MESOS_FRAMEWORK_ID=20150923-054500-169978048-5050-1212-0001
MESOS_CHECKPOINT=1
EXECUTOR_OPTS=-Xmx256m -Xms256m
SHLVL=1
LIBPROCESS_PORT=0
MESOS_SLAVE_ID=20150923-054500-169978048-5050-1212-S3
MESOS_RECOVERY_TIMEOUT=15mins
_=/usr/bin/env
Warning: MESOS_NATIVE_LIBRARY is deprecated, use MESOS_NATIVE_JAVA_LIBRARY instead. Future releases will not support JNI bindings via MESOS_NATIVE_LIBRARY.
06:30:30.839 [Thread-3] INFO  o.a.m.h.e.AbstractNodeExecutor - Creating a symbolic link for HDFS binary
06:30:30.843 [Thread-3] INFO  org.apache.mesos.hdfs.file.FileUtils - data dir exits:/opt/mesosphere
06:30:31.643 [Thread-3] INFO  o.a.m.h.e.AbstractNodeExecutor - Unable to unlink old sym link. Link may not exist. Exit code: 1
06:30:31.652 [Thread-3] INFO  o.a.m.h.e.AbstractNodeExecutor - The linked HDFS binary path is: /var/lib/mesos/sandbox/slaves/20150923-054500-169978048-5050-1212-S3/frameworks/20150923-054500-169978048-5050-1212-0001/executors/executor.journalnode.NodeExecutor.1442989617044/runs/4d1ad220-b667-4d8a-b08a-83a48914e9d3/hdfs-mesos-executor-0.1.4
06:30:31.652 [Thread-3] INFO  o.a.m.h.e.AbstractNodeExecutor - The symbolic link path is: /opt/mesosphere/hdfs
06:30:31.663 [Thread-3] ERROR o.a.m.h.e.AbstractNodeExecutor - Error creating the symbolic link to hdfs binary
java.io.FileNotFoundException: /usr/bin/hadoop (Read-only file system)
    at java.io.FileOutputStream.open(Native Method) ~[na:1.7.0_76]
    at java.io.FileOutputStream.<init>(Unknown Source) ~[na:1.7.0_76]
    at java.io.FileOutputStream.<init>(Unknown Source) ~[na:1.7.0_76]
    at org.apache.mesos.hdfs.executor.AbstractNodeExecutor.addBinaryToPath(AbstractNodeExecutor.java:168) [hdfs-executor-0.1.4-uber.jar:na]
    at org.apache.mesos.hdfs.executor.AbstractNodeExecutor.createSymbolicLink(AbstractNodeExecutor.java:147) [hdfs-executor-0.1.4-uber.jar:na]
    at org.apache.mesos.hdfs.executor.AbstractNodeExecutor.registered(AbstractNodeExecutor.java:82) [hdfs-executor-0.1.4-uber.jar:na]
06:30:31.665 [Thread-2] INFO  o.a.m.hdfs.executor.TaskShutdownHook - shutdown hook shutting down tasks
06:30:31.666 [Thread-2] INFO  o.a.mesos.hdfs.executor.NodeExecutor - Executor asked to shutdown

Any ideas what might be causing this?

F21 commented 8 years ago

I just remembered your comment regarding CoreOS being locked down. I then pre-distributed binaries for hadoop (CDH distribution) to all my CoreOS nodes and added it to the PATH environment variable available to the mesos slaves.

Now, the journal node is able to launch, but it keeps on complaining about the journal directory not being an absolute path:

************************************************************/
15/09/24 03:41:13 INFO server.JournalNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting JournalNode
STARTUP_MSG:   host = core-01/192.168.33.10
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 2.5.0-cdh5.3.1
STARTUP_MSG:   classpath = /var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/etc/hadoop:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/commons-compress-1.4.1.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/paranamer-2.3.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/commons-configuration-1.6.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/jackson-jaxrs-1.8.8.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/commons-cli-1.2.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/gson-2.2.4.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/jersey-core-1.9.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/jasper-runtime-5.5.23.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/avro-1.7.6-cdh5.3.1.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/hadoop-auth-2.5.0-cdh5.3.1.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/jackson-mapper-asl-1.8.8.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/commons-io-2.4.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/snappy-java-1.0.4.1.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/commons-lang-2.6.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/httpcore-4.2.5.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/junit-4.11.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/jsch-0.1.42.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/api-asn1-api-1.0.0-M20.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/commons-codec-1.4.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/jettison-1.1.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/hamcrest-core-1.3.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/jackson-core-asl-1.8.8.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/zookeeper-3.4.5-cdh5.3.1.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/commons-collections-3.2.1.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/log4j-1.2.17.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/jaxb-api-2.2.2.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/api-util-1.0.0-M20.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/java-xmlbuilder-0.4.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/jackson-xc-1.8.8.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/mockito-all-1.8.5.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/netty-3.6.2.Final.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/jsp-api-2.1.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/jetty-6.1.26.cloudera.4.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/commons-net-3.1.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/commons-digester-1.8.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/jetty-util-6.1.26.cloudera.4.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/commons-math3-3.1.1.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/curator-framework-2.6.0.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/jets3t-0.9.0.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/commons-httpclient-3.1.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/activation-1.1.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/jsr305-1.3.9.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/stax-api-1.0-2.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/httpclient-4.2.5.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/curator-recipes-2.6.0.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/commons-logging-1.1.3.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/servlet-api-2.5.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/jersey-json-1.9.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/jersey-server-1.9.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/jasper-compiler-5.5.23.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/apacheds-i18n-2.0.0-M15.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/xz-1.0.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/curator-client-2.6.0.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/hadoop-annotations-2.5.0-cdh5.3.1.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/asm-3.2.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/xmlenc-0.52.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/slf4j-api-1.7.5.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/commons-el-1.0.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/lib/guava-11.0.2.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/common/hadoop-common-2.5.0-cdh5.3.1.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/hdfs:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/hdfs/lib/jersey-core-1.9.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/hdfs/lib/jasper-runtime-5.5.23.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/hdfs/lib/jackson-mapper-asl-1.8.8.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/hdfs/lib/commons-io-2.4.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/hdfs/lib/commons-lang-2.6.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/hdfs/lib/commons-codec-1.4.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/hdfs/lib/jackson-core-asl-1.8.8.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/hdfs/lib/netty-3.6.2.Final.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/hdfs/lib/jsp-api-2.1.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/hdfs/lib/jetty-6.1.26.cloudera.4.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/hdfs/lib/jetty-util-6.1.26.cloudera.4.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/hdfs/lib/jsr305-1.3.9.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/hdfs/lib/commons-logging-1.1.3.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/hdfs/lib/servlet-api-2.5.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/hdfs/lib/jersey-server-1.9.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/hdfs/lib/asm-3.2.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/hdfs/lib/xmlenc-0.52.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/hdfs/lib/commons-el-1.0.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/hdfs/lib/guava-11.0.2.jar:/var/lib/mesos/sandbox/slaves/20150924-031430-169978048-5050-596-S0/frameworks/20150924-031430-169978048-5050-596-0000/executors/executor.journalnode.NodeExecutor.1443065388745/runs/2d390c59-d00f-46d3-afa9-347f94588fdf/hdfs-mesos-executor-0.1.4/share/hadoop/hdfs/hadoop-hdfs-2.5.0-cdh5.3.1.jar:/share/hadoop/yarn/*:/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar
STARTUP_MSG:   build = http://github.com/cloudera/hadoop -r 4cda8416c73034b59cc8baafbe3666b074472846; compiled by 'jenkins' on 2015-01-28T00:37Z
STARTUP_MSG:   java = 1.7.0_76
************************************************************/
15/09/24 03:41:13 INFO server.JournalNode: registered UNIX signal handlers for [TERM, HUP, INT]
Exception in thread "main" java.lang.IllegalArgumentException: Journal dir '${dataDir}/jn' should be an absolute path
    at org.apache.hadoop.hdfs.qjournal.server.JournalNode.validateAndCreateJournalDir(JournalNode.java:112)
    at org.apache.hadoop.hdfs.qjournal.server.JournalNode.start(JournalNode.java:136)
    at org.apache.hadoop.hdfs.qjournal.server.JournalNode.run(JournalNode.java:126)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
    at org.apache.hadoop.hdfs.qjournal.server.JournalNode.main(JournalNode.java:295)
15/09/24 03:41:13 INFO server.JournalNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down JournalNode at core-01/192.168.33.10
************************************************************/

if I check in the sandbox, the hdfs-site.xml looks like so:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>dfs.ha.automatic-failover.enabled</name>
        <value>true</value>
    </property>

    <property>
        <name>dfs.nameservice.id</name>
        <value>hdfs</value>
    </property>

    <property>
        <name>dfs.nameservices</name>
        <value>hdfs</value>
    </property>

    <property>
        <name>dfs.ha.namenodes.hdfs</name>
        <value>nn1</value>
    </property>

    <property>
        <name>dfs.namenode.rpc-address.hdfs.nn1</name>
        <value>:50071</value>
    </property>

    <property>
        <name>dfs.namenode.http-address.hdfs.nn1</name>
        <value>:50070</value>
    </property>

    <property>
        <name>dfs.client.failover.proxy.provider.hdfs</name>
        <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <property>
        <name>dfs.namenode.shared.edits.dir</name>
        <value>qjournal://192.168.33.10:8485/hdfs</value>
    </property>

    <property>
        <name>ha.zookeeper.quorum</name>
        <value>master.mesos:2181</value>
    </property>

    <property>
        <name>dfs.journalnode.edits.dir</name>
        <value>/var/lib/hdfs/data/jn</value>
    </property>

    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:///var/lib/hdfs/data/name</value>
    </property>

    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:///var/lib/hdfs/data/data</value>
    </property>

    <property>
        <name>dfs.ha.fencing.methods</name>
        <value>shell(/bin/true)</value>
    </property>

    <property>
        <name>dfs.permissions</name>
        <value>false</value>
    </property>

    <property>
        <name>dfs.datanode.du.reserved</name>
        <value>10485760</value>
    </property>

    <property>
        <name>dfs.datanode.balance.bandwidthPerSec</name>
        <value>41943040</value>
    </property>

    <property>
        <name>dfs.namenode.safemode.threshold-pct</name>
        <value>0.90</value>
    </property>

    <property>
        <name>dfs.namenode.heartbeat.recheck-interval</name>
        <!-- 60 seconds -->
        <value>60000</value>
    </property>

    <property>
        <name>dfs.datanode.handler.count</name>
        <value>10</value>
    </property>

    <property>
        <name>dfs.namenode.handler.count</name>
        <value>20</value>
    </property>

    <property>
        <name>dfs.image.compress</name>
        <value>true</value>
    </property>

    <property>
        <name>dfs.image.compression.codec</name>
        <value>org.apache.hadoop.io.compress.SnappyCodec</value>
    </property>

    <property>
        <name>dfs.namenode.invalidate.work.pct.per.iteration</name>
        <value>0.35f</value>
    </property>

    <property>
        <name>dfs.namenode.replication.work.multiplier.per.iteration</name>
        <value>4</value>
    </property>

    <!-- This property allows us to use IP's directly for communication instead of hostnames. -->
    <property>
        <name>dfs.namenode.datanode.registration.ip-hostname-check</name>
        <value>false</value>
    </property>

    <property>
        <name>dfs.client.read.shortcircuit</name>
        <value>true</value>
    </property>

    <property>
        <name>dfs.client.read.shortcircuit.streams.cache.size</name>
        <value>1000</value>
    </property>

    <property>
        <name>dfs.client.read.shortcircuit.streams.cache.size.expiry.ms</name>
        <value>1000</value>
    </property>

    <!-- This property needs to be consistent with mesos.hdfs.secondary.data.dir -->
    <property>
        <name>dfs.domain.socket.path</name>
        <value>/var/run/hadoop-hdfs/dn._PORT</value>
    </property>
</configuration>

In hdfs-mesos-executor-0.1.4/etc/hadoop/hdfs-site.xml, I see a copy with templated variables:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>dfs.ha.automatic-failover.enabled</name>
        <value>true</value>
    </property>

    <property>
        <name>dfs.nameservice.id</name>
        <value>${frameworkName}</value>
    </property>

    <property>
        <name>dfs.nameservices</name>
        <value>${frameworkName}</value>
    </property>

    <property>
        <name>dfs.ha.namenodes.${frameworkName}</name>
        <value>nn1</value>
    </property>

    <property>
        <name>dfs.namenode.rpc-address.${frameworkName}.nn1</name>
        <value>${nn1Hostname}:50071</value>
    </property>

    <property>
        <name>dfs.namenode.http-address.${frameworkName}.nn1</name>
        <value>${nn1Hostname}:50070</value>
    </property>

    <property>
        <name>dfs.client.failover.proxy.provider.${frameworkName}</name>
        <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>

    <property>
        <name>dfs.namenode.shared.edits.dir</name>
        <value>qjournal://${journalnodes}/${frameworkName}</value>
    </property>

    <property>
        <name>ha.zookeeper.quorum</name>
        <value>${haZookeeperQuorum}</value>
    </property>

    <property>
        <name>dfs.journalnode.edits.dir</name>
        <value>${dataDir}/jn</value>
    </property>

    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file://${dataDir}/name</value>
    </property>

    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file://${dataDir}/data</value>
    </property>

    <property>
        <name>dfs.ha.fencing.methods</name>
        <value>shell(/bin/true)</value>
    </property>

    <property>
        <name>dfs.permissions</name>
        <value>false</value>
    </property>

    <property>
        <name>dfs.datanode.du.reserved</name>
        <value>10485760</value>
    </property>

    <property>
        <name>dfs.datanode.balance.bandwidthPerSec</name>
        <value>41943040</value>
    </property>

    <property>
        <name>dfs.namenode.safemode.threshold-pct</name>
        <value>0.90</value>
    </property>

    <property>
        <name>dfs.namenode.heartbeat.recheck-interval</name>
        <!-- 60 seconds -->
        <value>60000</value>
    </property>

    <property>
        <name>dfs.datanode.handler.count</name>
        <value>10</value>
    </property>

    <property>
        <name>dfs.namenode.handler.count</name>
        <value>20</value>
    </property>

    <property>
        <name>dfs.image.compress</name>
        <value>true</value>
    </property>

    <property>
        <name>dfs.image.compression.codec</name>
        <value>org.apache.hadoop.io.compress.SnappyCodec</value>
    </property>

    <property>
        <name>dfs.namenode.invalidate.work.pct.per.iteration</name>
        <value>0.35f</value>
    </property>

    <property>
        <name>dfs.namenode.replication.work.multiplier.per.iteration</name>
        <value>4</value>
    </property>

    <!-- This property allows us to use IP's directly for communication instead of hostnames. -->
    <property>
        <name>dfs.namenode.datanode.registration.ip-hostname-check</name>
        <value>false</value>
    </property>

    <property>
        <name>dfs.client.read.shortcircuit</name>
        <value>true</value>
    </property>

    <property>
        <name>dfs.client.read.shortcircuit.streams.cache.size</name>
        <value>1000</value>
    </property>

    <property>
        <name>dfs.client.read.shortcircuit.streams.cache.size.expiry.ms</name>
        <value>1000</value>
    </property>

    <!-- This property needs to be consistent with mesos.hdfs.secondary.data.dir -->
    <property>
        <name>dfs.domain.socket.path</name>
        <value>/var/run/hadoop-hdfs/dn._PORT</value>
    </property>
</configuration>

Is there any reason why it's trying to load the template version of hdfs-site.xml rather than the final version of hdfs-site.xml?

elingg commented 8 years ago

Glad to hear of your progress! If you use the Predistributed binaries option (which does make sense as CoreOS is locked down like we discussed) that means you need to fill out hdfs-site.xml yourself to make sure it's configured properly. My recommendation would be if you are using predistributed binaries and also use Mesos DNS with the example configs for Mesos DNS as a part of your binaries, i.e. https://github.com/mesosphere/hdfs/tree/master/example-conf/mesosphere-dcos

F21 commented 8 years ago

Closing this as I was able to successfully launch a POC cluster with 3 slaves and 1 master/slave all running CoreOS!