lightbend / mesos-spark-integration-tests

Mesos Integration Tests on Docker/Ec2
16 stars 9 forks source link

fix-roles #47

Closed skonto closed 8 years ago

skonto commented 8 years ago
skyluc commented 8 years ago

Works fine, but I'm not sure about some of the cpu computations.

skonto commented 8 years ago

I propose to merge it and iterate in case there is a better option for the specific computations.

skyluc commented 8 years ago

OK, looking deeper, the problem I have is that not much in those tests is role specific.

The things that are role specific:

Not role specific:

The thing I didn't know, is that Mesos offers also the resources in * when a role is provided. So in coarse grain, it takes all the available cpus. What I think could be relevant, would be 2 tests for each mode, one with a role defined, one without a role defined. In the test without a role, only the resources in * are used. In the test with a role, all the cluster resources are used.

It is pretty straight forward in coarse grain mode, as the resources are just taken at the beginning. In fine grain mode, it should possible to do a bit like I do in the dynamic allocation tests: to wait, in the RDD transformation that the state of the cluster is what you expect. It requires also to split the RDD in enough partitions to keep busy all the cores. https://github.com/typesafehub/mesos-spark-integration-tests/pull/46/files#diff-816f85a87cc7a75b49c9d8c088801581R57 In your case, the RDD transformation could return a different value depending if the cluster gets to the 'right' state. Something like -1 if it is wrong the original value if it is right. Then you check that the result of the computation is what you expect. If not, it means something was wrong (not really precise ...).

skonto commented 8 years ago

I set the same standards for what someone should expect for checking role resources no need to mention it again. I havent reviewed your code yet just if it works or not... waiting there is one way... but my proposal is to iterate since this is a first attempt to cover some of the basic cases i found and they are ok... so you can either merge it or freeze it until it is 100% ok or for example someone else has another idea as well...

skonto commented 8 years ago

i will check for improvements...

dragos commented 8 years ago

Let's merge and iterate from there.

dragos commented 8 years ago

Unfortunately I can't merge due to conflicts, but I'd say let's move on with this!

skonto commented 8 years ago

i will work on this along with the merge...

skonto commented 8 years ago
dragos commented 8 years ago

A couple of minor comments, but I would like to understand how the test is supposed to work.

skonto commented 8 years ago

I fixed the issues related to the comments i have one case to resolve. The test logic works as follows, we test spark_only role:

2 slaves x (2 cpus for role (*) + 1 for the role spark_only)= 6

  1. In fine grain in client mode = 6 cpus should be utilized by the framework, if that happens that means the 2 role cpus one per slave are utilized. Then the test is ok
  2. In coarse grain in client mode = 6 cpus should be utilized by the framework, if that happens that means the 2 role cpus one per slave are utilized. Then the test is ok
  3. In fine grain in cluster mode = 5 cpus should be utilized by the framework 6 minus the Spark Cluster cpu, if that happens that means the 2 role cpus one per slave are utilized. Then the test is ok
  4. In coarse grain in cluster mode : working on it. Unfortunately case 4 is flaky i get either 3 or 4 utilized cpus still investigating it. I verify allocated cpus from master log and from http://172.17.0.1:5050/#/frameworks so im sure what im getting...
dragos commented 8 years ago

OK, this LGTM. Merging this.