Closed joydddd closed 2 months ago
Sniper uses the 'controller' module from Pin/SDE which is described in detail here: https://www.intel.com/content/www/us/en/developer/articles/technical/pintool-regions.html However, it only supports the following events: event ::= start|stop|threadid|precond in particular, there is no "warmup_start/warmup_stop" supported currently. So say a "-pinplay:control _warmupstart:address:0x3229:3981422787 :global is unfortunately not possible. A request to support warmup_start/stop in the controller has been noted.
Good news. Even though 'wamup-start/stop' events are not documented in the pintool-regions article (that will be fixed soon), the controller does support the events warmup-start and warmup-stop (note a '-', dash, not an '_', underscore). So something like "-pinplay:control warmup-start:address:0x3229:count3981422787:global" will work in the sense it will create an EVENT_WARMUP_START. Now the remaining task is for the Sniper Sift recorder to actually handle that event and do the 'right' thing in terms of providing warmup. The file https://github.com/snipersim/snipersim/blob/main/sift/recorder/sift_recorder.cc seems relevant here. In particular, the function at line 128 namely VOID Handler(CONTROLLER::EVENT_TYPE ev, VOID v, CONTEXT ctxt, VOID * ip, THREADID tid, BOOL bcast).
[ Thanks to @alenks for the Sniper details. ]
@hgpatil Thanks a lot for the help! This is very helpful information.
I managed to generate a warmup-start event with warmup_start:address:0x3229:3981422787:global
and handle it through sift handler https://github.com/snipersim/snipersim/blob/b58797c3993148174c7de23a18f71fc22e92340f/sift/recorder/sift_recorder.cc#L157
by adding
switch(ev)
{
case CONTROLLER::EVENT_WARMUP_START:
handleMagic(tid, ctxt, SIM_CMD_USER, 0x0be0000f, 2);
break;
case CONTROLLER::EVENT_START:
handleMagic(tid, ctxt, SIM_CMD_USER, 0x0be0000f, 0);
break;
case CONTROLLER::EVENT_STOP:
handleMagic(tid, ctxt, SIM_CMD_USER, 0x0be0000f, 1);
break;
default:
break;
}
and rewrite the script/simuserroi.py
import sim
SIM_USER_ROI = 0x0be0000f
class SimUserROI:
def setup(self, args):
roiscript = sim.config.get_bool('general/roi_script')
if not roiscript:
print '[SimUserROI] ERROR: --roi-script is not set, but is required when using a start instruction count. Aborting'
sim.control.abort()
return
sim.util.register_command(SIM_USER_ROI, self.set_roi)
# Out-of-bound set-roi
def set_roi(self, cmd, arg):
if (arg == 0): # start
print ('[SCRIPT] Start of ROI: beginning ROI')
sim.control.set_roi(True)
elif (arg == 1): # stop
print ('[SCRIPT] End of ROI: beginning ROI')
sim.control.set_roi(False)
elif (arg == 2): # warmup start
print ('[SCRIPT] Start of WARMUP: beginning WARMUP')
sim.control.set_instrumentation_mode(sim.control.WARMUP)
sim.util.register(SimUserROI())
However, the sequence these 3 events are generated isn't what I'm expecting... The warmup-start event should come first, but it comes after the event stop. I've attached my log output from Pinplay
Normal Controller knob 0 : stop:address:chain+0x3229:count4456164370:global
Normal Controller knob 1 : start:address:chain+0x3229:count4329326580:global
Normal Controller knob 2 : warmup-start:address:chain+0x3229:count3981422787:global
TID4: event: start at icount: 224152389 ip: 0x55ccfec62223 handler: 0x000000000
TID4: event: stop at icount: 1152043991 ip: 0x55ccfec62223 handler: 0x000000000
TID1: event: warmup-start at icount: 26994001175 ip: 0x55ccfec62223 handler: 0x000000000
The program I'm running is a muli-threaded program.
Good progress.
There is a chance you are using an outdated version of the controller. You can test the reachability of various regions using an SDE-based too 'sde-global-event-counter.so' Please use SDE 9.14 (not the latest). export SDE_BUILD_KIT=path to sde-external-9.14.0-2022-10-25-lin. including the sde kit name git clone https://github.com/intel/pinplay-tools cd pinplay-tools/GlobalLoopPoint ./sde-build-GlobalLoopPoint.sh (follow instructions about setting PINBALL2ELF)
Here's how the 'openmp' program included was tested for region reachability.
export OMP_NUM_THREADS=8 $SDE_BUILD_KIT/sde -t sde-global-event-icounter.so -thread_count 8 -prefix foo -control warmup-start:address:dotproduct-omp+0x1468:count7998391 -control start:address:dotproduct-omp+0x14d8:count5821991 -control stop:address:dotproduct-omp+0x14d8:count5821991 -- ./dotproduct-omp
This showed: global icount 460960064 Warmup-Start global icount 460960064 Late-Warmup-Start
global icount 540960151 Sim-Start global icount 540960151 Sim-End global icount 540960151 Late-Sim-Start global icount 540960151 Late-Sim-End
i.e. all the region events are reported in the expected order.
export SDE_BUILD_KIT=path-to-sde-9.14
I was using the SDE 9.14 from sniper package, but I can't build the tool sde-global-event-counter.so
.
Running ./sde-build-GlobalLoopPoint.sh
gives me an error
global_event_icounter.cpp:27:10: fatal error: pinball-sysstate.H: No such file or directory
27 | #include "pinball-sysstate.H"
| ^~~~~~~~~~~~~~~~~~~~
compilation terminated.
make: *** [/home/joydong/Desktop/spec/sde-external-9.14.0-2022-10-25-lin/pinkit/source/tools/Config/makefile.default.rules:233: obj-intel64/global_event_icounter.o] Error 1
Is there something I need to include in the path?
On the other hand, I do notice sometime the program got deadlock with low CPU utilization while generating bbv for some benchmarks. I suspect it might have something to do with the -fopenmp library. Is there anything I need to pay attention to working with omp programs? Some of the benchmarks I'm running uses omp.
Here is all the step I take to generate bbv and start simulation. pin_hook_init and pin_hook_fini are labels I inserted into my benchmarks around the compute phase I'm trying to analyze. // this is from dbg.cfg
[Parameters]
program_name: dbg
input_name: 0
command: ./dbg ../../input-datasets/dbg/large/ERR194147-mem2-chr22.bam chr22:0-50818468 ../../input-datasets/dbg/large/Homo_sapiens_assembly38.fasta 10
[gen fat pinball]
/mnt/sda/spec/looppoint/tools/sde-external-9.14.0-2022-10-25-lin/pinplay-scripts/sde_pinpoints.py --delete --mode mt --sdehome=/mnt/sda/spec/looppoint/tools/sde-external-9.14.0-2022-10-25-lin --cfg /mnt/sda/spec/genomicsbench/benchmarks/dbg/dbg.cfg --log_options -control start:address:pin_hook_init:bcast,stop:address:pin_hook_fini:bcast -log:fat -log:mp_atomic 0 -log:mp_mode 0 -log:strace -log:basename /mnt/sda/spec/genomicsbench/benchmarks/dbg/custom-dbg-0-test-passive-10-20230707152450/whole_program.0/dbg.0 --replay_options=-replay:strace -l
[gen dcfg]
/mnt/sda/spec/looppoint/tools/sde-external-9.14.0-2022-10-25-lin/pinplay-scripts/replay.py --pintool=sde-global-looppoint.so --pintool_options -dcfg -replay:deadlock_timeout 0 -replay:strace -dcfg:out_base_name /mnt/sda/spec/genomicsbench/benchmarks/dbg/custom-dbg-0-test-passive-10-20230707152450/whole_program.0/dbg.0_2055763 /mnt/sda/spec/genomicsbench/benchmarks/dbg/custom-dbg-0-test-passive-10-20230707152450/whole_program.0/dbg.0_2055763
[gen bbv]
/mnt/sda/spec/looppoint/tools/sde-external-9.14.0-2022-10-25-lin/pinplay-scripts/sde_pinpoints.py --pintool=sde-global-looppoint.so --sdehome=/mnt/sda/spec/looppoint/tools/sde-external-9.14.0-2022-10-25-lin --global_regions --pccount_regions --cfg /mnt/sda/spec/genomicsbench/benchmarks/dbg/dbg.cfg --whole_pgm_dir /mnt/sda/spec/genomicsbench/benchmarks/dbg/custom-dbg-0-test-passive-10-20230707152450/whole_program.0 --mode mt -S 0 -b --replay_options -replay:deadlock_timeout 0 -global_profile -emit_vectors 0 **-filter_exclude_lib libgomp.so.1** **-filter_exclude_lib libiomp5.so** -looppoint:global_profile -looppoint:dcfg-file /mnt/sda/spec/genomicsbench/benchmarks/dbg/custom-dbg-0-test-passive-10-20230707152450/whole_program.0/dbg.0_2055763.replay.dcfg.json.bz2 -looppoint:main_image_only 1 -looppoint:loop_info dbg.0.loop_info.txt -flowcontrol:verbose 1 -flowcontrol:quantum 1000000 -flowcontrol:maxthreads 10
[gen cluster]
/mnt/sda/spec/looppoint/tools/sde-external-9.14.0-2022-10-25-lin/pinplay-scripts/sde_pinpoints.py --pintool=sde-global-looppoint.so --global_regions --pccount_regions --cfg /mnt/sda/spec/genomicsbench/benchmarks/dbg/dbg.cfg --whole_pgm_dir /mnt/sda/spec/genomicsbench/benchmarks/dbg/custom-dbg-0-test-passive-10-20230707152450/whole_program.0 -S 0 --warmup_factor=1 --maxk=50 --dimensions=100 --append_status -s --simpoint_options= -dim 100 -coveragePct 1.0 -maxK 50
[run sniper]
/run-sniper -n 10 -v -sprogresstrace:10000000 -gtraceinput/timeout=2000 -gscheduler/type=static -cicelake_s --trace-args=-sniper:flow 1000 -ssimuserwarmup --roi-script --trace-args=-pinplay:control stop:address:dbg+0x7520:count13687183217:global --trace-args=-pinplay:control warmup-stop:address:dbg+0x7520:count13686433631:global --trace-args=-pinplay:control start:address:dbg+0x7520:count13685799593:global --trace-args=-pinplay:controller_log 1 --trace-args=-pinplay:controller_olog /mnt/sda/spec/genomicsbench/benchmarks/dbg/custom-dbg-0-test-passive-10-20230707152450/simulation/r1/pinplay_controller.log -ggeneral/inst_mode_init=fast_forward -gperf_model/fast_forward/oneipc/include_memory_latency=true -d /mnt/sda/spec/genomicsbench/benchmarks/dbg/custom-dbg-0-test-passive-10-20230707152450/simulation/r1 -- ./dbg ../../input-datasets/dbg/large/ERR194147-mem2-chr22.bam chr22:0-50818468 ../../input-datasets/dbg/large/Homo_sapiens_assembly38.fasta 10
./sde-build-GlobalLoopPoint.sh should instruct you about setting PINBALL2ELF PINBALL2ELF not defined Point to a clone of https://github.com/intel/pinball2elf
That should let you see the file >..>pinball2elf/pintools/PinballSYSState/pinball-sysstate.H
./sde-build-GlobalLoopPoint.sh should instruct you about setting PINBALL2ELF PINBALL2ELF not defined Point to a clone of https://github.com/intel/pinball2elf
That should let you see the file >..>pinball2elf/pintools/PinballSYSState/pinball-sysstate.H
Thanks a lot. I managed to get it running through my pipeline. It looks great: clusters are generated, and events are triggered in sniper in the right order.
I then try to apply the same script and environment to my benchmarks. Unfortunately, I notice sometimes the replayer deadlocks, or no bbv is generated, in the gen bbv step while running my benchmarks which also use openmp.
Are there any additional steps I need to take for openmp programs? I notice that there is an option to filter out omp spins in the sde_pinpoint.py options. Also the gen_bbv command above that I took from looppoint filters omp library for replayed -filter_exclude_lib libgomp.so.1 -filter_exclude_lib libiomp5.so
One more thing I want to ask: Some of my benchmarks has long loading stage before the compute stage which I care about starts running. I've inserted assembly labels pin_hook_init
pin_hook_fini
into the benchmark around the compute stage. In the fat pinball generation stage I inserted -control start:address:pin_hook_init:bcast,stop:address:pin_hook_fini:bcast
to skip logging for those regions and made things significantly faster. Should I also use the same control flags for dcfg stage and bbv stage?
@hgpatil, LoopPoint tools on GitHub is already based on SDE 9.14. The problem @joydddd pointed out initially -- the warmup event occurring after the detailed simulation region -- was with Sniper, right? So, is that working correctly now, @joydddd?
The deadlock issue may be happening within the flowcontrol code. You could try running the BBV generation step disabling flowcontrol and see if the issue persists.
@hgpatil, LoopPoint tools on GitHub is already based on SDE 9.14. The problem @joydddd pointed out initially -- the warmup event occurring after the detailed simulation region -- was with Sniper, right? So, is that working correctly now, @joydddd?
The deadlock issue may be happening within the flowcontrol code. You could try running the BBV generation step disabling flowcontrol and see if the issue persists.
@alenks No, the warmup event occurring after the detailed simulation region issue persists. I don't think the problem is with sniper -- It happened during the simulation phase, but pinplay is responsible for generating the EVENT_START EVENT_WARMUP_START, and EVENT_STOP that trigger sniper to use different simulation modes.
Here is the command sniper python script generates to run pin, where pinplay is responsible for generating the -control events, and also the controller logs as attached below. This log shows that pinplay is generating the events in the wrong order:
[SIFT_RECORDER] Running /mnt/sda/spec/looppoint/tools/sniper/pin_kit/pin -mt -injection child -xyzzy -ifeellucky -follow_execv 1 -t /mnt/sda/spec/looppoint/tools/sniper/sift/recorder/obj-intel64/sift_recorder -sniper:verbose 1 -sniper:debug 0 -sniper:roi 0 -sniper:roi-mpi 0 -sniper:f 0 -sniper:d 0 -sniper:b 0 -sniper:o /tmp/tmpcpvHLE/run_benchmarks -sniper:e 1 -sniper:s 0 -sniper:r 1 -sniper:pa 0 -sniper:rtntrace 1 -sniper:stop 0 -sniper:flow 1000 -pinplay:control stop:address:chain+0x3229:count4456164370:global -pinplay:control start:address:chain+0x3229:count4329326580:global -pinplay:control warmup-start:address:chain+0x3229:count3981422787:global -pinplay:controller_log 1 -pinplay:controller_olog /mnt/sda/spec/genomicsbench/benchmarks/chain/custom-chain-0-test-passive-10-20230713104623/simulation/r1/pinplay_controller.log -- ./chain -i ../../input-datasets/chain/large/c_elegans_40x.10k.in -o ../../input-datasets/chain/large/c_elegans_40x.10k.out -t 10
Normal Controller knob 0 : stop:address:chain+0x3229:count4456164370:global
Normal Controller knob 1 : start:address:chain+0x3229:count4329326580:global
Normal Controller knob 2 : warmup-start:address:chain+0x3229:count3981422787:global
TID4: event: start at icount: 224179560 ip: 0x5622b42f1223 handler: 0x000000000
TID3: event: stop at icount: 1183105926 ip: 0x5622b42f1223 handler: 0x000000000
TID4: event: warmup-start at icount: 27031693813 ip: 0x5622b42f1223 handler: 0x000000000
@joydddd Right, so were you able to see the events in the right order using the sde-global-event-icounter.so
tool (with SDE-9.14) like @hgpatil mentioned? Sniper, as downloaded with the LoopPoint kit, uses PinPlay-3.11 as the frontend by default now. If you could verify that the sde-global-event-icounter.so
tool works correctly, we may need to use SDE-9.14 as the Sniper frontend instead.
Thanks for the quick reply! I tried using sde-global-event-icounter.so
tool with my benchmark that was having the wrong order triggering warmup-start and start. It prints
Running with threads: 10
PIN START
global icount 1167266636481 Warmup-Start
global icount 1167266658085 Late-Warmup-Start
global icount 1187260825052 Sim-Start
global icount 1187260842479 Late-Sim-Start
global icount 1197281815866 Sim-End
global icount 1197281848459 Late-Sim-End
PIN END
Time in kernel: 8003.59 sec
which is in the expected order. I'll try using sde-9.14 instead of pinplay-3.11 as the sniper frontend and see if that solves the problem.
@joydddd Some issues were reported while compiling Sniper with SDE-9.14, which haven't been fixed yet. I'd recommend using Pin-3.22 as the Sniper frontend (make distclean && make USE_PIN=1
will do that) instead if you don't need any PinPlay-specific features. I just tested it, and the events occurred in the expected order. I had to use the absolute PCs instead of Image + PC Offset
and keep setarch x86_64 -R
before the run-sniper
command to disable ASLR.
@joydddd Some issues were reported while compiling Sniper with SDE-9.14, which haven't been fixed yet. I'd recommend using Pin-3.22 as the Sniper frontend (
make distclean && make USE_PIN=1
will do that) instead if you don't need any PinPlay-specific features. I just tested it, and the events occurred in the expected order. I had to use the absolute PCs instead ofImage + PC Offset
and keepsetarch x86_64 -R
before therun-sniper
command to disable ASLR.
@alenks Thanks for the suggestion! I used Pin as the frontend the whole time as I didn't need any pinplay-specific features, but I used the image + offset. e.g. A pinpoint I generated would look like
# RegionId = 1 Slice = 29 Icount = 290000004592 Length = 9999997251 Weight = 0.65909 Multiplier = 29.000 ClusterSlicecount = 29 ClusterIcount = 290000682735
#Start: pc : 0x55c7ecfd1229 image: chain offset: 0x3229 absolute_count: 4329326580 source-info: Unknown:0
#End: pc : 0x55c7ecfd1229 image: chain offset: 0x3229 absolute_count: 4456164370 relative_count: 17391687.0 source-info: Unknown:0
cluster 0 from slice 29,global,1,0x55c7ecfd1229,chain,0x3229,4329326580,0x55c7ecfd1229,chain,0x3229,4456164370,17391687,9999997251,0.65909,29.000,simulation
instead of using chain+0x3229, I will test using 0x55c7ecfd1229
with pin frontend.
One more question: I was testing pinplay with another benchmark on a new machine so I had to setup pinplay, pinball2elf, and sde again in the new environment. I run into this error while running sde-build-GlobalLoopPoint.sh
global_isimpoint_inst.H: In constructor ‘GLOBALBLOCK::GLOBALBLOCK(const BLOCK_KEY&, INT32, INT32, INT32)’:
global_isimpoint_inst.H:78:57: error: no matching function for call to ‘BLOCK::BLOCK(const BLOCK_KEY&, INT32&, INT3
2&, INT32&, int, int)’
78 | : BLOCK(key, instructionCount, id, imgId,1,FALSE)
| ^
In file included from global_isimpoint_inst.H:22,
from global_isimpoint_inst.cpp:18:
/home/joydong/spec/sde/pinkit/sde-example/include/isimpoint_inst.H:158:5: note: candidate: ‘BLOCK::BLOCK(const BLOC
K_KEY&, INT64, INT32, INT32)’
158 | BLOCK(const BLOCK_KEY& key, INT64 instructionCount, INT32 id, INT32 imgId);
| ^~~~~
/home/joydong/spec/sde/pinkit/sde-example/include/isimpoint_inst.H:158:5: note: candidate expects 4 arguments, 6
provided
/home/joydong/spec/sde/pinkit/sde-example/include/isimpoint_inst.H:155:7: note: candidate: ‘BLOCK::BLOCK(const BLOC
K&)’
155 | class BLOCK
| ^~~~~
/home/joydong/spec/sde/pinkit/sde-example/include/isimpoint_inst.H:155:7: note: candidate expects 1 argument, 6 p
rovided
/home/joydong/spec/sde/pinkit/sde-example/include/isimpoint_inst.H:155:7: note: candidate: ‘BLOCK::BLOCK(BLOCK&&)’
/home/joydong/spec/sde/pinkit/sde-example/include/isimpoint_inst.H:155:7: note: candidate expects 1 argument, 6 p
rovided
and
At global scope:
cc1plus: error: unrecognized command line option ‘-Wno-dangling-pointer’ [-Werror]
cc1plus: all warnings being treated as errors
Do you have any clues about the error?
Thank you so much!
Joy
pinplay-tools was recently updated to support the latest SDE (version 9.21) and it is now the default. SDE 9.21 has some header changes which requires the pinplay-tools update. You may either:
Thanks! I'll use -DOLDSDE
flag to fix the problem.
"export CFLAGS=-DOLDSDE" before running sde-build-GlobalLoopPoint.sh should do it.
I was running through all my benchmarks. Although most benchmarks worked fine, one run into this error while using the sde-global-icount tool:
$SDE_BUILD_KIT/sde -t sde-global-event-icounter.so -prefix foo -thread_count 10 -control start:address:pin_hook_init,stop:address:pin_hook_fini -controller_log 1 -controller_olog /mnt/sda/spec/genomicsbench/benchmarks/kmer-cnt/roi/roi-controller.log -- ./kmer-cnt --reads ../../input-datasets/kmer-cnt/large/Loman_E.coli_MAP006-1_2D_50x.fasta --config ../../tools/Flye/flye/config/bin_cfg/asm_raw_reads.cfg --threads 10 2>&1 | tee /mnt/sda/spec/genomicsbench/benchmarks/kmer-cnt/roi/roi.out
[2023-07-20 06:27:08] INFO: Reading sequences
global icount 9545567829 Sim-Start
global icount 9545567829 Late-Sim-Start
PIN START
[2023-07-20 06:27:23] INFO: Counting k-mers:
0% ERROR: Unexpected tid 10 check '-thread_count' value 10
(if using replay, provide '-xyzzy -replay:deadlock_timeout 0')
A: global_event_icounter.cpp: ThreadStart: 360: assertion failed: FALSE
################################################################################
## STACK TRACE
################################################################################
??? at /mnt/sda/spec/looppoint/tools/sde-external-9.14.0-2022-10-25-lin/intel64/sde-global-event-icounter.so+0x00005eafa
??? at /mnt/sda/spec/looppoint/tools/sde-external-9.14.0-2022-10-25-lin/intel64/sde-global-event-icounter.so+0x00052278a
LEVEL_VM::VM_THREAD::Attach+0x00000027d at /mnt/sda/spec/looppoint/tools/sde-external-9.14.0-2022-10-25-lin/intel64/pinbin+0x00025741d
LEVEL_VM::VM_THREAD::Run+0x000000010 at /mnt/sda/spec/looppoint/tools/sde-external-9.14.0-2022-10-25-lin/intel64/pinbin+0x000257690
DoClone+0x0000001e9 at /mnt/sda/spec/looppoint/tools/sde-external-9.14.0-2022-10-25-lin/intel64/pinbin+0x000309da9
??? at libc-dynamic.so+0x00007f3ed
??? at libc-dynamic.so+0x00007fb94
Pin: pin-3.25-98650-8f6168173
Copyright 2002-2022 Intel Corporation.
Aborted
@joydddd, we checked in a version of Sniper (on dev-sde
branch) that supports the latest SDE (9.24) as the frontend. Please try that out and let us know if you face any issues.
Hi @alenks, thanks you so much for providing that! I'll try using it.
Hi! Thanks a lot for creating this project. It has been a great help for me to build my simulation pipeline with sniper. I want to know if there is documentation about how to use Pinplay to setup a warmup region for sniper.
My current setup: I built my pipeline with looppoint. https://github.com/nus-comparch/looppoint I used my own benchmark and successfully generated mt_pinballs, dcfg, bbv and cluster. The cluster is generated for both roi and warmup region.
Then I start sniper simulation with
where pinplay generates EVENT_START and EVENT_STOP from
-pinplay:control start:address:chain+0x3229:count4329326580:global -pinplay:control stop:address:chain+0x3229:count4456164370:global
and snipersim pinkit will handle these two events and control the simulation.
I want to update the pinplay script so that it also generates EVENT_WARMUP_START at
address:0x3229:3981422787 :global
to trigger sniper handler start a warmup region.