luca-s / mpi-master-slave

Master Slave code in mpi4py
MIT License
36 stars 10 forks source link

Can Master also do the work? #6

Closed louisXW closed 5 years ago

louisXW commented 5 years ago

Hi luca-s,

Thanks for your pretty useful tips for mpi4py!! I am wondering can the master also be a slave? So the master not only handle I/O but also be a slave that need to finish the worker assigned from itself. So if a machine have n cores then only n processes will be launched with mpirun -np n python .py. This is slightly different from the example you described here with mpirun -np n+1 python py.

What will be the potential issues with the parallelization method that master also being a slave?

BTW, could you explain a little bit more with the example you described here "mpirun -np n+1 python *py" how does these n+1 processes being allocated on the n cores? does the additional 1 process being allocated into one of the n cores which means there is one core that will deal with 2 processes. Or the n+1 processes are being allocated equally to the n cores.

Best.

luca-s commented 5 years ago

What will be the potential issues with the parallelization method that master also being a slave?

The master has to check the work_queue (self.work_queue.done()) and start slaves (self.work_queue.do_work()). So if the master is busy doing something else, it could happen that you have slaves ready to work but the master doesn't start them. So it's a waste of resources.

BTW, could you explain a little bit more with the example you described here "mpirun -np n+1 python *py" how does these n+1 processes being allocated on the n cores? does the additional 1 process being allocated into one of the n cores which means there is one core that will deal with 2 processes. Or the n+1 processes are being allocated equally to the n cores.

How the running processes are assigned to the cores depends on the Operating System (it is the responsibility of the scheduler). I believe it is safe to assume that the OS is smart enough to assign a process to each core if the number of running processes is less or equal to the available cores. If the number of running process is more then the number of cores, then the cores are shared. It measn each process has a time slot in which it can run on a core, then the same core is used to another process and so on in a round robin fashion (the details depend on the scheduler of the OS).

The important detail is that the master process calls time.sleep(0.3), which tells the OS that it doesn't want to run for the next 0.3 second. So, for the next 0.3 sec, the OS has only the other N slaves to run, which can be assigned each of them to one of the N cores. After the 0.3 sec are elapsed, the OS has again N+1 (the master) processes to run.The master will be assigned a time slot to run, in which it checks the work_queue and possibly runs other slaves. Since the master task takes only few milliseconds to be accomplished and then the master goes to sleep again for other 0.3 sec, the result is that the master is running (using a core) for only few milliseconds every 0.3 seconds. That is the master is sleeping most of the time (~99,9% of the time). Hence, for 99,9% of the time, there are only N slaves running, one for each of the N core.

louisXW commented 5 years ago

Thanks for the indepth analysis! I am wondering if there is any tool could give an insight analysis result on how the processes are scheduled on these multiple cores. Or is there a way to tell which scheduler is used on my system.

For my case, I implement the master-slave framework with "mpirun -np n+1 python *.py" and each slave solving expensive partial differential equations (PDEs) problems. I observed that there is always one salve finishing the task much slower (twice slower) than other slaves. It seems that the slower salve is sharing the cores with the master. In my case, the master task is cheap while other salves are doing the slave takes, since the master only need the final result returned from the slave. Then what really disturbed me is that what's the reason that there is one slave that is twice slower than other slaves?

Could you please provide some suggestions on this?

luca-s commented 5 years ago

Your hypothesis that the slower slave is sharing the core with the master seems plausible. Still I don't understand why the slave that share the core with the master should be almost 2 times slower than the others, since the master should be sleeping most of the time.

I wonder how much time the slaves spent on average to finish their task. Let's say that on average a slave spends at least few seconds to accomplish its task, than the overhead caused by the exchanging data with the master and the time the master spend in starting and gathering the slaves is negligible. On the other hand, if a slave spend less than a second to compute its task than this master-slave schema might become too much of an overhead.

louisXW commented 5 years ago

It confused me as well that the slave sharing core with the master is so slow. In my case, the slave task is much more expensive than the master task with (20 minutes vs seconds). Hence the master task is negligible. However, one possible reason I am wondering might be the memory resource instead of the CPU. Because master and the slave are not only sharing CPU but also memory resources which are really important in my case for the solving of PDEs models. Is it possible that part of memory resources are occupied by the master for the whole code running time so there are fewer memory resources is left for the slave that sharing core with the master.

Does this make sense? is there a way to confirm this?

luca-s commented 5 years ago

I thought more about this and there might be a test you can run to better understand what's going on. I don't know what OS you run the code on, but I hope the command line 'top' utility is available, which allows you to monitor per core cpu usage. If you run 'top' and then press '1' you will see real tine cpu usage for every core. At this point you run your code with mpirun -np n python *.py. Now, if our reasoning of master cpu usage being negligible is true, you should see that for most of the time n-1 cores being in 100% usage while the last core (the master) being used only now and then, but mostly idle. Depending on the OS scheduler the master slave could be assigned to a different core at diffrent points in time, but the overall picture is that you have n-1 cores fully used and 1 mostly idle. If this is not true, then you found why one of the slave takes much more time to complete.

I also want to add to my previous reply that to make sure that the master cpu usage is negligible we need those conditions:

So, let's say your master spends 0.2 sec to perform its job then it sleeps for 0.3 sec, that means the master usage will be 0.2/0.3 = 66% core usage!!!!!! In this case I would suggest to change the sleep time so that the master actual awake time is again negligible compare to the sleep time.

What are the risks of choosing a master sleep time too big? If a slave finish its job it doesn't do anything until the master awake again and gives the slave a new task. So the ideal sleep time should be much greater than the master awake time and much smaller than the average slave execution time.

Regarding the memory concern you have I don't believe that's an issue, but it's hard to say without knowing the details of your code.

luca-s commented 5 years ago

I forgot to say that I don't believe there is anything wrong with having a master process that uses a lot of CPU. Simply avoid calling time.sleep(...) so that the master makes use of all the computing power and then call mpirun -np n python *.py instead of mpirun -np n+1 python *.py

luca-s commented 5 years ago

I am closing this,.Feel free to reopen the issue if you have still questions

louisXW commented 5 years ago

Hi luca-s,

I have spent some time look at this and have done two tests on this problem:

The result indicates that in test1 the worker did not share the core with the master and the CPU usage of the 24 processes (23 workers + 1 master) is around 100%. While in test2, there is a worker sharing the core with the master, and the CPU usage for both are nearly half and half with around 50% for each. For other processes in test2, the CPU usage is 100% all the time and did not share with the master.

It seems that the master uses the CPU for all the time even there is no computational task to do (only message checking task). For the case that there is a worker sharing the core with the master, the OS scheduler will distribute the CPU half to master and another half to the worker, which is the reason why there is a worker twice slower than other workers.

I am wondering if there is a way to reduce the CPU occupy rate for the master since there aren't many computing tasks to do for mater while waiting for the result of other workers. Will thetime.sleep() the function works if I increase the sleeping time of the master. In the current code, the master is implemented with while True function: for master while workers are doing computing tasks:

        while True:
            if comm.Iprobe():
            s = MPI.Status()
            data = comm.recv(status=s, source=MPI.ANY_SOURCE, tag=0)
            #================================#
            Handle the message
            #================================#

BTW, I did not see where the time.sleep() function is. Is it inside the MPI implementation?

Thanks

luca-s commented 5 years ago

The time.sleep() I am referring to is the one used in the tutorial. Please have a look at the reference example I gave in the REAME. Have a look at the Example 1 in the paragraph "Writing your Application". My discussion was always referring to the example master (MyApp) shown there. Let me know if this clarify things.

louisXW commented 5 years ago

Thanks! It does help. I add the time.sleep(30) for my master and the CPU usage by master does reduce to around 1%.

luca-s commented 5 years ago

Good! I am happy it worked out well for you.