radical-experiments / AIMES-Experience

Experiments for the AIMES practice paper
MIT License
0 stars 0 forks source link

Specifying flops only in when emulating workflow using task.py #9

Open mingtaiha opened 8 years ago

mingtaiha commented 8 years ago

@andre-merzky, I've been using the task.py in order to emulate the workflow. While I am not getting any runtime errors, I find that the task_length specified controls the length of the emulation instead of the flops. Is it possible to be able to specify the amount of flops we wish to emulate but not the amount of time?

andre-merzky commented 8 years ago

Hi Ming,

first, I pushed small updates to both synapse and skeleton branches, those should now be able to handle no-file-io cases like this command:

./bin/aimes-skeleton-synapse.py serial flops 1 10000000000 1 1 0 0

you still need the last 4 args though.

Second, if parameter 2 is set to flops instead of time, it should in fact influence the number of flops computed. What is the behavior in your case, why do you think sleep is used?

mingtaiha commented 8 years ago

Hey Andre, thanks for getting back to me.

To the second point, when I specify task_mode = flops, the execution of the emulation lasts several seconds. While I don't think that sleep is being used, it seems that when I specify the time of the sample, Synapse emulates based on the length of the time specified in the sample.

To the first point, I looked aimes-skeleton-synapse.py. I am a little confused by what the last 4 parameters are in the command you gave. I understand the first 4 parameters, which is task_type, task_mode, num_tasks, and task_length, the script still takes in 5 args, the size of the read/write buffers, the number of input/output files, and the interleave option.

Also, aimes-skeleton-synapse.py assumes there be input and output files as arguments in the main function, using the functions readfiles and writefiles to emulate the reads/writes. Can I remove these functions since there are not reads/writes?

andre-merzky commented 8 years ago

To the second point, when I specify task_mode = flops, the execution of the emulation lasts several seconds.

Well, when you specify a large number of flops, the emulation is supposed to take a couple of seconds, to actually perform that amount of computation. That is kind of the point :)

While I don't think that sleep is being used, it seems that when I specify the time of the sample, Synapse emulates based on the length of the time specified in the sample.

I am not sure what you mean with 'time of the sample'. If task_mode is set to flops, then the task_length is interpreted as flops, not time. If the task_mode is set to time, then indeed, task_length is interpreted as time, and the emulation will in fact sleep for that number of seconds.

It would be helpful if you include the commands you run, explain what you would expect, and describe what you in fact observe.

To the first point, I looked aimes-skeleton-synapse.py. I am a little confused by what the last 4 parameters are in the command you gave. I understand the first 4 parameters, which is task_type, task_mode, num_tasks, and task_length, the script still takes in 5 args, the size of the read/write buffers, the number of input/output files, and the interleave option.

Indeed - that is how task.c behaves, and the main purpose of task.py (or now aimes-skeleton-synapse.py) is to mimic the syntax of task.c, including its quirks.

Also, aimes-skeleton-synapse.py assumes there be input and output files as arguments in the main function, using the functions readfiles and writefiles to emulate the reads/writes. Can I remove these functions since there are not reads/writes?

I am not sure what you mean with 'remove the functions'. We certainly don't want to remove the code in the repository, only to add it again later when we might run an experiment including I/O, right? With the latest versions from the skeleton and synapse repos (in the branches we discussed), you should be able to run emulation which does not involve any I/O, both with skeleton and synapse emulation:

(ve) $ time ./bin/aimes-skeleton-synapse.py serial flops 1 10000000000 1 1 0 0 0
[...]
r:0m2.767s  u:0m2.636s  s:0m0.016s

(ve) $ time ./bin/aimes-skeleton-synapse.py serial time 1 3 1 1 0 0 0
[...]
r:0m3.966s  u:0m3.844s  s:0m0.012s

(ve) $ time ./src/aimes/skeleton/task  serial time 1 3 1 1 0 0 0
[...]
r:0m3.005s  u:0m0.000s  s:0m0.000s
mturilli commented 8 years ago

On Wed, Jun 29, 2016 at 3:03 PM, Andre Merzky notifications@github.com wrote:

To the second point, when I specify task_mode = flops, the execution of the emulation lasts several seconds.

Well, when you specify a large number of flops, the emulation is supposed to take a couple of seconds, to actually perform that amount of computation. That is kind of the point :)

I am not sure I understand what this means. Ming's tests show that the task we are using will take around 430 seconds to be completed on Stampede, around 200 on SuperMIC as currently profiled. How many flops do we need to set so to obtain these results? More specifically, what number of flops we need to ask so for Skeleton+Synapse to produce a profile with analogous characteristics to the one we used so far for Ming's tests?

Dr Matteo Turilli Department of Electrical and Computer Engineering Rutgers University

andre-merzky commented 8 years ago

How many flops do we need to set so to obtain these results?

Uh, I don't know! Is that one of the workflow tasks for which we have the profiles? Then the flops should be in those json files. If there is no profile, then you might want to run a certain number of flops, say 100000000000, and time that. Once you measured that, you'll know how many flops per second you can expect, and then scale the number up to the target of 430 seconds. Does that make sense?

mturilli commented 8 years ago

On Wed, Jun 29, 2016 at 4:29 PM, Andre Merzky notifications@github.com wrote:

Uh, I don't know! Is that one of the workflow tasks for which we have the

profiles? Then the flops should be in those json files. If there is no profile, then you might want to run a certain number of flops, say 100000000000, and time that. Once you measured that, you'll know how many flops per second you can expect, and then scale the number up to the target of 430 seconds. Does that make sense?

Yes.

Ming: did you use in your test the same number of flops of the profile you used to make your tests?

mingtaiha commented 8 years ago

Yes. The number of flops which I used to run aimes-skeleton-synapse.py was from one of Vivek's profiles, and I profiled the runtime of the profile using the commandradical-synapse-emulate -i <vivek_profile.json>.

EDIT: Would the writing which radical-synapse-emulate to stdout cause the time difference?

andre-merzky commented 8 years ago

Sorry Ming, I am at a loss still. I still don't understand how exactly the result differs from expectation. What is the exact result, what did you expect, why did you expect that?

Also, how does the which call enter the picture?

mturilli commented 8 years ago

Should we have a meeting today to speed this up?

On Thu, Jun 30, 2016 at 4:17 AM, Andre Merzky notifications@github.com wrote:

Sorry Ming, I am at a loss still. I still don't understand how exactly the result differs from expectation. What is the exact result, what did you expect, why did you expect that?

Also, how does the which call enter the picture?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/radical-experiments/AIMES-Experience/issues/9#issuecomment-229592989, or mute the thread https://github.com/notifications/unsubscribe/ACN0JOSSwcW3n4NoKwMTjaX6JDjKv9Y5ks5qQ3uagaJpZM4JAvNq .

Dr Matteo Turilli Department of Electrical and Computer Engineering Rutgers University

andre-merzky commented 8 years ago

Yeah, that probably makes sense.

I'll be available for the next 1.5 hours, then again after 4.30pm E. I can also make other times work if that is too restrictive.

mturilli commented 8 years ago

4:30pm E works for me, thank you. Ming?

On Thu, Jun 30, 2016 at 6:23 AM, Andre Merzky notifications@github.com wrote:

Yeah, that probably makes sense.

I'll be available for the next 1.5 hours, then again after 4.30pm E. I can also make other times work if that is too restrictive.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/radical-experiments/AIMES-Experience/issues/9#issuecomment-229620525, or mute the thread https://github.com/notifications/unsubscribe/ACN0JHNM6KwFxZ1hewsOQ1VmpkT-cL8Oks5qQ5khgaJpZM4JAvNq .

Dr Matteo Turilli Department of Electrical and Computer Engineering Rutgers University

andre-merzky commented 8 years ago

FWIW, I added profile based emulation to aimes-skeleton-synapse.py, in a somewhat hackish way. It allows not in the sleleton input:

[...]
Stage_Name = Stage_1
    Task_Type = serial
    Task_Mode = prof
    Task_Profile = 84.json
    Interleave_Option = 0
[...]

It will only work with Interleave_Option = 0 -- otherwise the profile sampling and the interleave-chopping will conflict.

This needs the last version of the synapse branch feature/named_storage. The given profile must be in the current workdir.

This is not expected to be merged into skeleton any time soon -- its way too unclean. But it might suffice for some experiments...

mingtaiha commented 8 years ago

4:30pm E is fine

mingtaiha commented 8 years ago

@andre-merzky, I think I am able to narrow down the point of confusion. Given the profile, say 84.json, would you say that the total number of flops performed in this operation is the sum of the flops value of each entry in the list sequence?

andre-merzky commented 8 years ago

Yes, that would be correct.

mingtaiha commented 8 years ago

!!!! Therein lies the confusion. I read the profile wrong then. My expectation was the by reading flops_per_core since we are only using one core. I later realized that I had to take the sum of the flops value of each entry in the list sequence, and that gave a more consistent flops count

I do have some additional questions though. What does flops mean here? Also, does flops mean Floating Point Operations, Floating Point Operations per second? I'd like to get a better handle on what each of the fields of a Synapse profile means to avoid future confusion.