Open mingtaiha opened 8 years ago
Hi Ming,
first, I pushed small updates to both synapse and skeleton branches, those should now be able to handle no-file-io cases like this command:
./bin/aimes-skeleton-synapse.py serial flops 1 10000000000 1 1 0 0
you still need the last 4 args though.
Second, if parameter 2 is set to flops
instead of time
, it should in fact influence the number of flops computed. What is the behavior in your case, why do you think sleep is used?
Hey Andre, thanks for getting back to me.
To the second point, when I specify task_mode = flops
, the execution of the emulation lasts several seconds. While I don't think that sleep is being used, it seems that when I specify the time of the sample, Synapse emulates based on the length of the time specified in the sample.
To the first point, I looked aimes-skeleton-synapse.py. I am a little confused by what the last 4 parameters are in the command you gave. I understand the first 4 parameters, which is task_type, task_mode, num_tasks, and task_length, the script still takes in 5 args, the size of the read/write buffers, the number of input/output files, and the interleave option.
Also, aimes-skeleton-synapse.py assumes there be input and output files as arguments in the main function, using the functions readfiles and writefiles to emulate the reads/writes. Can I remove these functions since there are not reads/writes?
To the second point, when I specify task_mode = flops, the execution of the emulation lasts several seconds.
Well, when you specify a large number of flops, the emulation is supposed to take a couple of seconds, to actually perform that amount of computation. That is kind of the point :)
While I don't think that sleep is being used, it seems that when I specify the time of the sample, Synapse emulates based on the length of the time specified in the sample.
I am not sure what you mean with 'time of the sample'. If task_mode
is set to flops, then the task_length
is interpreted as flops, not time. If the task_mode
is set to time
, then indeed, task_length
is interpreted as time, and the emulation will in fact sleep for that number of seconds.
It would be helpful if you include the commands you run, explain what you would expect, and describe what you in fact observe.
To the first point, I looked aimes-skeleton-synapse.py. I am a little confused by what the last 4 parameters are in the command you gave. I understand the first 4 parameters, which is task_type, task_mode, num_tasks, and task_length, the script still takes in 5 args, the size of the read/write buffers, the number of input/output files, and the interleave option.
Indeed - that is how task.c
behaves, and the main purpose of task.py
(or now aimes-skeleton-synapse.py
) is to mimic the syntax of task.c, including its quirks.
Also, aimes-skeleton-synapse.py assumes there be input and output files as arguments in the main function, using the functions readfiles and writefiles to emulate the reads/writes. Can I remove these functions since there are not reads/writes?
I am not sure what you mean with 'remove the functions'. We certainly don't want to remove the code in the repository, only to add it again later when we might run an experiment including I/O, right? With the latest versions from the skeleton and synapse repos (in the branches we discussed), you should be able to run emulation which does not involve any I/O, both with skeleton and synapse emulation:
(ve) $ time ./bin/aimes-skeleton-synapse.py serial flops 1 10000000000 1 1 0 0 0
[...]
r:0m2.767s u:0m2.636s s:0m0.016s
(ve) $ time ./bin/aimes-skeleton-synapse.py serial time 1 3 1 1 0 0 0
[...]
r:0m3.966s u:0m3.844s s:0m0.012s
(ve) $ time ./src/aimes/skeleton/task serial time 1 3 1 1 0 0 0
[...]
r:0m3.005s u:0m0.000s s:0m0.000s
On Wed, Jun 29, 2016 at 3:03 PM, Andre Merzky notifications@github.com wrote:
To the second point, when I specify task_mode = flops, the execution of the emulation lasts several seconds.
Well, when you specify a large number of flops, the emulation is supposed to take a couple of seconds, to actually perform that amount of computation. That is kind of the point :)
I am not sure I understand what this means. Ming's tests show that the task we are using will take around 430 seconds to be completed on Stampede, around 200 on SuperMIC as currently profiled. How many flops do we need to set so to obtain these results? More specifically, what number of flops we need to ask so for Skeleton+Synapse to produce a profile with analogous characteristics to the one we used so far for Ming's tests?
Dr Matteo Turilli Department of Electrical and Computer Engineering Rutgers University
How many flops do we need to set so to obtain these results?
Uh, I don't know! Is that one of the workflow tasks for which we have the profiles? Then the flops should be in those json files. If there is no profile, then you might want to run a certain number of flops, say 100000000000, and time that. Once you measured that, you'll know how many flops per second you can expect, and then scale the number up to the target of 430 seconds. Does that make sense?
On Wed, Jun 29, 2016 at 4:29 PM, Andre Merzky notifications@github.com wrote:
Uh, I don't know! Is that one of the workflow tasks for which we have the
profiles? Then the flops should be in those json files. If there is no profile, then you might want to run a certain number of flops, say 100000000000, and time that. Once you measured that, you'll know how many flops per second you can expect, and then scale the number up to the target of 430 seconds. Does that make sense?
Yes.
Ming: did you use in your test the same number of flops of the profile you used to make your tests?
Yes. The number of flops which I used to run aimes-skeleton-synapse.py
was from one of Vivek's profiles, and I profiled the runtime of the profile using the commandradical-synapse-emulate -i <vivek_profile.json>
.
EDIT: Would the writing which radical-synapse-emulate
to stdout cause the time difference?
Sorry Ming, I am at a loss still. I still don't understand how exactly the result differs from expectation. What is the exact result, what did you expect, why did you expect that?
Also, how does the which
call enter the picture?
Should we have a meeting today to speed this up?
On Thu, Jun 30, 2016 at 4:17 AM, Andre Merzky notifications@github.com wrote:
Sorry Ming, I am at a loss still. I still don't understand how exactly the result differs from expectation. What is the exact result, what did you expect, why did you expect that?
Also, how does the which call enter the picture?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/radical-experiments/AIMES-Experience/issues/9#issuecomment-229592989, or mute the thread https://github.com/notifications/unsubscribe/ACN0JOSSwcW3n4NoKwMTjaX6JDjKv9Y5ks5qQ3uagaJpZM4JAvNq .
Dr Matteo Turilli Department of Electrical and Computer Engineering Rutgers University
Yeah, that probably makes sense.
I'll be available for the next 1.5 hours, then again after 4.30pm E. I can also make other times work if that is too restrictive.
4:30pm E works for me, thank you. Ming?
On Thu, Jun 30, 2016 at 6:23 AM, Andre Merzky notifications@github.com wrote:
Yeah, that probably makes sense.
I'll be available for the next 1.5 hours, then again after 4.30pm E. I can also make other times work if that is too restrictive.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/radical-experiments/AIMES-Experience/issues/9#issuecomment-229620525, or mute the thread https://github.com/notifications/unsubscribe/ACN0JHNM6KwFxZ1hewsOQ1VmpkT-cL8Oks5qQ5khgaJpZM4JAvNq .
Dr Matteo Turilli Department of Electrical and Computer Engineering Rutgers University
FWIW, I added profile based emulation to aimes-skeleton-synapse.py
, in a somewhat hackish way. It allows not in the sleleton input:
[...]
Stage_Name = Stage_1
Task_Type = serial
Task_Mode = prof
Task_Profile = 84.json
Interleave_Option = 0
[...]
It will only work with Interleave_Option = 0
-- otherwise the profile sampling and the interleave-chopping will conflict.
This needs the last version of the synapse branch feature/named_storage
. The given profile must be in the current workdir.
This is not expected to be merged into skeleton any time soon -- its way too unclean. But it might suffice for some experiments...
4:30pm E is fine
Yes, that would be correct.
!!!! Therein lies the confusion. I read the profile wrong then. My expectation was the by reading flops_per_core
since we are only using one core. I later realized that I had to take the sum of the flops
value of each entry in the list sequence
, and that gave a more consistent flops count
I do have some additional questions though. What does flops
mean here? Also, does flops
mean Floating Point Operations, Floating Point Operations per second? I'd like to get a better handle on what each of the fields of a Synapse profile means to avoid future confusion.
@andre-merzky, I've been using the task.py in order to emulate the workflow. While I am not getting any runtime errors, I find that the task_length specified controls the length of the emulation instead of the flops. Is it possible to be able to specify the amount of flops we wish to emulate but not the amount of time?