lightbend / mesos-spark-integration-tests

Mesos Integration Tests on Docker/Ec2
16 stars 9 forks source link

Improve UX - update docs #25

Closed skonto closed 8 years ago

skonto commented 8 years ago

We need to improve UX with a default script to run everything.Update docs with how to pass a compiled spark distro to the tests and run.sh, explain if a user needs to use spark.executor.uri.

skonto commented 8 years ago

accidentally closed it..

skonto commented 8 years ago

Change message printed in run.sh script for resources allocated.

For example on my local machine with 8 cores and 8 GB ram i get, running the script with 2 slaves:

Using 4 cpus and 4030M memory for slaves.

This should be changed to the following, because i get the same output even if i pass some resource configuration to the slaves: "Using 4 cpus and 4030M memory per slave. (WARN: this can be further changed if you pass specific slave configuration affecting resources. Check mesos UI for slave assigned resources.)"

skonto commented 8 years ago

We need to:

skonto commented 8 years ago

we have a default script. I will check default resources.

skonto commented 8 years ago

Spark driver in cluster mode itself needs cpu=1 ,mem=512 (ip:8081 shows that), so at any given time we need 2 cpus there... and thats why roleSpec tests run ok because they get 1 cpu from spark_role and one from (*). Ideally we would split the tests and run role based tests independently, thus using only 2 cpus, but that needs dynamic reconfiguration of slaves or re-create the cluster.

RoleSpec is broken needs "spark.cores.max" -> "1". Also it is interesting to note that in fine grain mode the number of cores allocated to mesos executor is the same allocated used to the framework. If i set spark.mesos.mesosExecutor.cores=0.1 i will get framework.resources.cpu =0.1

For now valid setups in terms of cpu allocation are:

spark_role    *    spark.mesos.mesosExecutor.cores    #slaves 1    1    0       2    1    2    0       1 1    3    1 (default)       1

I propose to use 4 cpus 2 slaves for now. May have to revise it in the future.

skonto commented 8 years ago

Another option is to have before each role based test a reservation of some resources and then restore them back when the test is finished. With dynamic reservation http://mesos.apache.org/documentation/latest/reservation/ i think this is possible.