mesos / hadoop

Hadoop on Mesos
176 stars 80 forks source link

Is the offering logging misleading? #44

Open hansbogert opened 9 years ago

hansbogert commented 9 years ago

Hi,

I can't get my cluster of 80 CPUs and 200GB+ of mem to allocate the last 8.5 CPUs. In the logging I can see this repeatedly:

15/03/09 11:56:12 INFO mapred.ResourcePolicy: Declining offer with insufficient resources for a TaskTracker:
  cpus: offered 0.8499999940395355 needed at least 0.15000000596046448
  mem : offered 20182.0 needed at least 368.0
  disk: offered 1859053.0 needed at least 0.0
  ports:  at least 2 (sufficient)

I'm not sure why the hadoop/mesos is declining, every resource demand has been met.

tarnfeld commented 9 years ago

Have you seen https://github.com/mesos/hadoop/issues/26? Maybe that's related?

hansbogert commented 9 years ago

Ahh this explains:

https://github.com/mesos/hadoop/blob/eef6c53436cf4f969d9d6c1bc58c0f9a7498e5c9/src/main/java/org/apache/hadoop/mapred/ResourcePolicyVariable.java#L23

Maybe we should add some log output, to make this a bit more clear.

For people experiencing the same, the 'problem' is that the mesos container running the tasktracker is probably given some value by property, 'mapred.mesos.tasktracker.cpus', in my case 0.15. So in my case 7 CPUs (per node) are completely taken by slots, plus 0.15 for the TT. Then Mesos tries to handout the remaining 0.85 CPU resources, but https://github.com/mesos/hadoop/blob/eef6c53436cf4f969d9d6c1bc58c0f9a7498e5c9/src/main/java/org/apache/hadoop/mapred/ResourcePolicyVariable.java#L23 returns false to https://github.com/mesos/hadoop/blob/eef6c53436cf4f969d9d6c1bc58c0f9a7498e5c9/src/main/java/org/apache/hadoop/mapred/ResourcePolicy.java#L277 and only then logs the output as in the this issue's start post. Though the logging does not tell it accurately.

ashwanthkumar commented 9 years ago

I have a similar issue

2015-08-17 06:59:14,377 INFO org.apache.hadoop.mapred.ResourcePolicy: Declining offer with insufficient resources for a TaskTracker: 
  cpus: offered 5.0 needed at least 1.0
  mem : offered 1680.0 needed at least 1024.0
  disk: offered 15038.0 needed at least 1024.0
  ports:  at least 2 (sufficient)

I don't have maximum map / reduce tasks as 0. What else could be an issue?