robfitzgerald / csci5593-project

exploring the performance of Open-MPI on our new Intel Xeon E5-2650v4 cluster, Heracles
1 stars 0 forks source link

Baseline Node Communication #3

Closed robfitzgerald closed 7 years ago

robfitzgerald commented 7 years ago

we want to establish a baseline figure for the communication time between nodes. while they will likely be close, it would be rigorous for our study to have measured the time.

send a large number of messages between combinations of nodes and average the result. maybe consider running a few times at different times of the day.

mkalan commented 7 years ago

I think this can be as simple as running the complete topology test for one iteration, for one message. Or we could crank up the message value and average the runtime. Either way, we wouldn't need to code anything, just analyze the data and record our findings.

robfitzgerald commented 7 years ago

Yes, we want an average, so cranking up the message value would be good.

Thing is, for this as well as for star, we want to control the network for these tests, right? If we use the "complete" test for star and for baseline, we will be throwing many messages between many nodes, and this may result in different network performance than only sending messages between a pair at a time.

On Apr 24, 2017 4:01 PM, "Matthew Kalan" notifications@github.com wrote:

I think this can be as simple as running the complete topology test for one iteration, for one message. Or we could crank up the message value and average the runtime. Either way, we wouldn't need to code anything, just analyze the data and record our findings.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/robfitzgerald/csci5593-project/issues/3#issuecomment-296834498, or mute the thread https://github.com/notifications/unsubscribe-auth/AGrbjq7rlDtx-tkdoSGpYpYDb2O47Qa7ks5rzRucgaJpZM4NDjNo .

mkalan commented 7 years ago

We had discussed using the star topology for the baseline test. We then planned to pass the executable file the center node, in an attempt to clear the message passing system after each test. Now that I have thought about this more I'm concerned that the program may not place processes in the same order each time it is called, resulting in uncontrolled results. I don't have a solution yet, but the best solution may be to handle this within the c++ code.

robfitzgerald commented 7 years ago

I imagine it might be something like this. Srun node=x,y for all x,y in the set of two node combinations. Declare 1 process per node, 500 messages.

On Apr 24, 2017 10:24 PM, "Matthew Kalan" notifications@github.com wrote:

We had discussed using the star topology for the baseline test. We then planned to pass the executable file the center node, in an attempt to clear the message passing system after each test. Now that I have thought about this more I'm concerned that the program may not place processes in the same order each time it is called, resulting in uncontrolled results. I don't have a solution yet, but the best solution may be to handle this within the c++ code.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/robfitzgerald/csci5593-project/issues/3#issuecomment-296899966, or mute the thread https://github.com/notifications/unsubscribe-auth/AGrbjuBosmQBOHK18IOi5nSQNvWwvYnVks5rzXWLgaJpZM4NDjNo .

robfitzgerald commented 7 years ago

And then the code itself just has process 0 send process 1 all the messages.

On Apr 24, 2017 10:32 PM, "Robert Fitzgerald" robfitzgerald@gmail.com wrote:

I imagine it might be something like this. Srun node=x,y for all x,y in the set of two node combinations. Declare 1 process per node, 500 messages.

On Apr 24, 2017 10:24 PM, "Matthew Kalan" notifications@github.com wrote:

We had discussed using the star topology for the baseline test. We then planned to pass the executable file the center node, in an attempt to clear the message passing system after each test. Now that I have thought about this more I'm concerned that the program may not place processes in the same order each time it is called, resulting in uncontrolled results. I don't have a solution yet, but the best solution may be to handle this within the c++ code.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/robfitzgerald/csci5593-project/issues/3#issuecomment-296899966, or mute the thread https://github.com/notifications/unsubscribe-auth/AGrbjuBosmQBOHK18IOi5nSQNvWwvYnVks5rzXWLgaJpZM4NDjNo .