Closed bb30994 closed 4 years ago
Project ID: 9431
Core: OPENMM_21
Credit: 7600
Frames: 100
Name: local Slot 01
Path: 127.0.0.1-36330
Number of Frames Observed: 300
Min. Time / Frame : 00:04:46 - 77,277.1 PPD
Avg. Time / Frame : 00:04:54 - 74,144.5 PPD
Name: local Slot 02
Path: 127.0.0.1-36330
Number of Frames Observed: 300
Min. Time / Frame : 00:02:57 - 158,722.9 PPD
Avg. Time / Frame : 00:02:58 - 157,387.4 PPD
In fact, that report isn't particularly useful since the CPUs for Slot 1 and Slot 2 are, in fact, identical. A more useful report would say:
Number of CPU:6 Frames Observed: 300
...
Number of CPU:3 Frames Observed: 300
...
Moreover, it might actually be saying something like:
Number of CPU:6 Frames Observed: 270
...
Number of CPU:4 Frames Observed: 30
...
Number of CPU:3 Frames Observed: 300
...
Hi Bruce - So let me make sure I understand correctly. You want to group the benchmarks per slot based on the configured CPUs for the slot. So for example, a single slot would have a different group of benchmarks based on how many CPUs were configured when the frames were observed.
Name: Slot 01 Path: 127.0.0.1-36330 Number of Frames Observed: 100 CPU:8
Min. Time / Frame : 00:01:46 - 32,010.5 PPD Avg. Time / Frame : 00:01:51 - 29,872.2 PPD
Number of Frames Observed: 200 CPU:6
Min. Time / Frame : 00:01:58 - 20,000.5 PPD Avg. Time / Frame : 00:02:00 - 19,500.2 PPD
On Thu, May 17, 2018 at 4:58 PM, Bruce notifications@github.com wrote:
Project ID: 9431 Core: OPENMM_21 Credit: 7600 Frames: 100
Name: local Slot 01 Path: 127.0.0.1-36330 Number of Frames Observed: 300
Min. Time / Frame : 00:04:46 - 77,277.1 PPD Avg. Time / Frame : 00:04:54 - 74,144.5 PPD
Name: local Slot 02 Path: 127.0.0.1-36330 Number of Frames Observed: 300
Min. Time / Frame : 00:02:57 - 158,722.9 PPD Avg. Time / Frame : 00:02:58 - 157,387.4 PPD
In fact, that report isn't particularly useful since the CPUs for Slot 1 and Slot 2 are, in fact, identical. A more useful report would say:
Number of Frames Observed: 300 CPU:6 ... Number of Frames Observed: 300 CPU:3 ...
Moreover, it might end up saying something like:
Number of Frames Observed: 270 CPU:6 ... Number of Frames Observed: 30 CPU:4 ... Number of Frames Observed: 300 CPU:3 ...
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/harlam357/hfm-net/issues/298#issuecomment-390026997, or mute the thread https://github.com/notifications/unsubscribe-auth/AM4tyJy9rhaGGIlpw6zANPVvE8dS7mp0ks5tzfKhgaJpZM4TPR4c .
That's the idea but it's a bit more complicated than that in ways that I hadn't though of.
I'm running a WU with CPUS=6. I'd like to track all other WUs that have been run in that configuration.
I reconfigure my slot to have one slot with 2 CPUs and another with 4. The performance figures for those slots should not be mixed with the figures for CPU:6.
If a WU is assigned to the CPU:2 slot happens to get reconfigured to run in a CPU:6 slot, FAHClient currently will use only 2 of CPUs until a new WUs is assigned for the CPU:6 configuration. The WU "remembers" the configuration when the WU was assigned.
What I didn't take into account is what happens when I reduce the number of CPUs. A WU started with CPU:6 CAN be reconfigured to run with only 2 or 4 at the time I split the slot. It will run to completion, albeit slower so some of the frames are completed at one speed and others are completed at a different speed, distorting the average.
the latter makes things much more complicated, and if you don't choose to dig that deeply into the relativy exotic case, I'll accept the basic case discussed in the first few paragraphs.
Please note: message attached
From: harlam357 notifications@github.com To: harlam357/hfm-net hfm-net@noreply.github.com Cc: Bruce borden.b@juno.com, Author author@noreply.github.com Subject: Re: [harlam357/hfm-net] Distinguish CPU slot by number of threads (#298) Date: Sat, 25 Aug 2018 18:20:10 -0700
Released with v0.9.17
Identified CPU or GPU is now included in the Benchmarks data along with actual CPU threads.
When I summarize data by slot, HFM doesn't mix data for one type of GPU with another GPU and I know which GPU model is in each slot. This doesn't work well when I adjust the number of CPUs running in a slot because I do change the number of CPUs allocated to that slot. I'd like to be able to distinguish between a slot running with CPU:M from the slot running with CPU:N (for M.ne.N).