Distinguish CPU slot by number of threads

bb30994 commented 6 years ago

When I summarize data by slot, HFM doesn't mix data for one type of GPU with another GPU and I know which GPU model is in each slot. This doesn't work well when I adjust the number of CPUs running in a slot because I do change the number of CPUs allocated to that slot. I'd like to be able to distinguish between a slot running with CPU:M from the slot running with CPU:N (for M.ne.N).

bb30994 commented 6 years ago

 Project ID: 9431
 Core: OPENMM_21
 Credit: 7600
 Frames: 100

 Name: local Slot 01
 Path: 127.0.0.1-36330
 Number of Frames Observed: 300

 Min. Time / Frame : 00:04:46 - 77,277.1 PPD
 Avg. Time / Frame : 00:04:54 - 74,144.5 PPD

 Name: local Slot 02
 Path: 127.0.0.1-36330
 Number of Frames Observed: 300

 Min. Time / Frame : 00:02:57 - 158,722.9 PPD
 Avg. Time / Frame : 00:02:58 - 157,387.4 PPD

In fact, that report isn't particularly useful since the CPUs for Slot 1 and Slot 2 are, in fact, identical. A more useful report would say:

 Number of CPU:6 Frames Observed: 300 
...
 Number of CPU:3 Frames Observed: 300 
...

Moreover, it might actually be saying something like:

 Number of CPU:6 Frames Observed: 270
... 
 Number of CPU:4 Frames Observed: 30
...
 Number of CPU:3 Frames Observed: 300
...

harlam357 commented 6 years ago

Hi Bruce - So let me make sure I understand correctly. You want to group the benchmarks per slot based on the configured CPUs for the slot. So for example, a single slot would have a different group of benchmarks based on how many CPUs were configured when the frames were observed.

Name: Slot 01 Path: 127.0.0.1-36330 Number of Frames Observed: 100 CPU:8

Min. Time / Frame : 00:01:46 - 32,010.5 PPD Avg. Time / Frame : 00:01:51 - 29,872.2 PPD

Number of Frames Observed: 200 CPU:6

Min. Time / Frame : 00:01:58 - 20,000.5 PPD Avg. Time / Frame : 00:02:00 - 19,500.2 PPD

On Thu, May 17, 2018 at 4:58 PM, Bruce notifications@github.com wrote:

Project ID: 9431 Core: OPENMM_21 Credit: 7600 Frames: 100

Name: local Slot 01 Path: 127.0.0.1-36330 Number of Frames Observed: 300

Min. Time / Frame : 00:04:46 - 77,277.1 PPD Avg. Time / Frame : 00:04:54 - 74,144.5 PPD

Name: local Slot 02 Path: 127.0.0.1-36330 Number of Frames Observed: 300

Min. Time / Frame : 00:02:57 - 158,722.9 PPD Avg. Time / Frame : 00:02:58 - 157,387.4 PPD

In fact, that report isn't particularly useful since the CPUs for Slot 1 and Slot 2 are, in fact, identical. A more useful report would say:

Number of Frames Observed: 300 CPU:6 ... Number of Frames Observed: 300 CPU:3 ...

Moreover, it might end up saying something like:

Number of Frames Observed: 270 CPU:6 ... Number of Frames Observed: 30 CPU:4 ... Number of Frames Observed: 300 CPU:3 ...

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/harlam357/hfm-net/issues/298#issuecomment-390026997, or mute the thread https://github.com/notifications/unsubscribe-auth/AM4tyJy9rhaGGIlpw6zANPVvE8dS7mp0ks5tzfKhgaJpZM4TPR4c .

bb30994 commented 6 years ago

That's the idea but it's a bit more complicated than that in ways that I hadn't though of.

I'm running a WU with CPUS=6. I'd like to track all other WUs that have been run in that configuration.

I reconfigure my slot to have one slot with 2 CPUs and another with 4. The performance figures for those slots should not be mixed with the figures for CPU:6.

If a WU is assigned to the CPU:2 slot happens to get reconfigured to run in a CPU:6 slot, FAHClient currently will use only 2 of CPUs until a new WUs is assigned for the CPU:6 configuration. The WU "remembers" the configuration when the WU was assigned.

What I didn't take into account is what happens when I reduce the number of CPUs. A WU started with CPU:6 CAN be reconfigured to run with only 2 or 4 at the time I split the slot. It will run to completion, albeit slower so some of the frames are completed at one speed and others are completed at a different speed, distorting the average.

the latter makes things much more complicated, and if you don't choose to dig that deeply into the relativy exotic case, I'll accept the basic case discussed in the first few paragraphs.

Bruce

-

Please note: message attached

From: harlam357 notifications@github.com To: harlam357/hfm-net hfm-net@noreply.github.com Cc: Bruce borden.b@juno.com, Author author@noreply.github.com Subject: Re: [harlam357/hfm-net] Distinguish CPU slot by number of threads (#298) Date: Sat, 25 Aug 2018 18:20:10 -0700

harlam357 commented 4 years ago

Released with v0.9.17

Identified CPU or GPU is now included in the Benchmarks data along with actual CPU threads.

harlam357 / hfm-net

Distinguish CPU slot by number of threads #298

Bruce

-