nus-comparch / looppoint

Sampled simulation of multi-threaded applications using LoopPoint methodology
https://looppoint.github.io
12 stars 7 forks source link

Pin stack overflow occurs when running SPEC CPU2017 621.wrf_s. #7

Open icyclv opened 3 months ago

icyclv commented 3 months ago

Hi, thank you for the Looppoint. Recently, I've been trying to use LoopPoint to collect representative regions on SPEC CPU. Following the readme file, I successfully ran SPEC CPU2017 603.bwaves (command: ./speed_bwaves_base.icc bwaves_1 < bwaves_1.in). However, when I run SPEC CPU2017 621.wrf_s, it shows a Pin stack overflow error. Do I need to adjust some settings?

Here is my cfg file:

[Parameters]
program_name: loopiccgo2
input_name: 1
command: ./wrf_s_base.icc

The log of the exception section is as follows:

***  Finished generating whole program pinballs [log_whole]  ***    April 03, 2024 17:28:13

+++  Using whole program pinballs in dir: whole_program.1

***  TRACING: END  ***    April 03, 2024 17:28:13
Running commands:
/mnt/hdd/users/ycchang/code/performance/looppoint/tools/sde-external-9.14.0-2022-10-25-lin/pinplay-scripts/replay.py --pintool=sde-global-looppoint.so  --pintool_options -dcfg -replay:deadlock_timeout 0 -replay:strace -dcfg:out_base_name /mnt/hdd/users/ycchang/code/performance/looppoint/apps/wrf_icc_o2/custom-loopiccgo2-1-test-passive-8-20240403172006/whole_program.1/loopiccgo2.1_2882515 /mnt/hdd/users/ycchang/code/performance/looppoint/apps/wrf_icc_o2/custom-loopiccgo2-1-test-passive-8-20240403172006/whole_program.1/loopiccgo2.1_2882515

......
WRF NUMBER OF TILES =   8
......
wrf: SUCCESS COMPLETE WRF
In: 
Thread: 0
PID: 2889645
SYSTEM TID: 2889645
Exception code: ACCESS_DENIED
Exception Class: 2
Faulty AccessType : 0
Exception address: 0x14923732b008
E: Pin stack overflow in thread 2889645

It also causes subsequent tasks to fail. The complete log file is attached.

Thank you for your project, and I look forward to your response.

looppoint.log.txt

alenks commented 3 months ago

Not quite sure what is causing the failure. Can you rerun turning off flow-control (--no-flowcontrol)? Also, it'd be helpful to know if the problem repeats with other Pin/SDE tools using the same binary. Can you try running a simple SDE tool, like the mix tool ($SDE_BUILD_KIT/sde -mix -- ./wrf_s_base.icc), to verify that?

icyclv commented 3 months ago

After selecting --no-flowcontrol, the error still occurs. Additionally, it seems that 'sde -mix' can run normally.

I tested other workloads from SPEC CPU (like pop2), and they also run without error. I suspect there might be some specific issues with WRF.

alenks commented 3 months ago

I have introduced a new flag --binary-profile to the run-looppoint.py script to enable binary profiling of the application instead of relying on a pinball. Could you test this option with wrf to see if it resolves the issue?

icyclv commented 3 months ago

Sorry, I tested WRF with the flag --binary-profile, but it seems to still show a stack overflow error. Here's the log:

Running commands:
/mnt/hdd/users/ycchang/code/performance/looppoint/tools/sde-external-9.14.0-2022-10-25-lin/sde64 -t sde-global-looppoint.so -dcfg -dcfg:out_base_name /mnt/hdd/users/ycchang/code/performance/looppoint/apps/wrf_icc_o2/custom-loopiccgo2-1-test-passive-8-20240408170440/whole_program.1/dcfg-out -- ./wrf_s_base.icc

......
WRF NUMBER OF TILES FROM OMP_GET_MAX_THREADS =   8
......
wrf: SUCCESS COMPLETE WRF
In: 
Thread: 0
PID: 2891854
SYSTEM TID: 2891854
Exception code: ACCESS_DENIED
Exception Class: 2
Faulty AccessType : 0
Exception address: 0x1515921a5008
E: Pin stack overflow in thread 2891854
alenks commented 3 months ago

@hgpatil, Have you seen this problem before? @icyclv, Meanwhile, could you see what is causing the issue exactly by enabling debugging (-pause_tool 20) info? See the link for details.