yuyangJin / PerFlow

Domain-specific framework for performance analysis of parallel programs
11 stars 2 forks source link

请教一下,在build目录进行make之后,如何使用test或example中的样例? #3

Open ghostsea opened 10 months ago

ghostsea commented 10 months ago

目前我已经在perflow目录下执行过了cmake并在build中完成了make。但是关于如何执行用例获取分析结果目前没看到有说明。请问一下应该如何操作?

另外,请问一下在builtin的test里面,.sh文件有一个路径GPERF_DIR=/mnt/home/jinyuyang/MY_PROJECT/BaguaTool/build/example/project/graph_perf 请问这个路径是存放什么内容的?

在example的py文件中,有proj_dir = os.environ['BAGUA_DIR'],BAGUA_DIR这个环境变量是保存的什么路径?

ghostsea commented 10 months ago

我尝试将BAGUA_DIR 改为perflow路径,运行example中的py脚本后报错:

ython3 communication_pattern_analysis.py /usr1/PerFlow/build/example/comm_pattern_analysis/cg.B.x-64p-20231219-102348/static_data/cg.B.x.pag 0.50user 0.01system 0:00.10elapsed 489%CPU (0avgtext+0avgdata 80264maxresident)k 0inputs+744outputs (15major+17432minor)pagefaults 0swaps original_GOMP_parallel = 0x7fcc94d268a0 SET sampling interval to 3100000 cycles PAPI_add_events(EventSet, (int *)Events, NUM_EVENTS), ErrCode: Component containing event is disabled PAPI_overflow(EventSet, PAPI_TOT_CYC, this->cyc_sample_count, 0, _papi_overflow_handler), ErrCode: Component Index isn't set PAPI_start(EventSet), ErrCode: Component Index isn't set srun: error: s_p_parse_file: unable to status file /etc/slurm-llnl/slurm.conf: No such file or directory, retrying in 1sec up to 60sec 请问这里需要什么操作呢

yuyangJin commented 10 months ago

目前我已经在perflow目录下执行过了cmake并在build中完成了make。但是关于如何执行用例获取分析结果目前没看到有说明。请问一下应该如何操作?

另外,请问一下在builtin的test里面,.sh文件有一个路径GPERF_DIR=/mnt/home/jinyuyang/MY_PROJECT/BaguaTool/build/example/project/graph_perf 请问这个路径是存放什么内容的?

在example的py文件中,有proj_dir = os.environ['BAGUA_DIR'],BAGUA_DIR这个环境变量是保存的什么路径?

你好,感谢关注:

  1. GPERF_DIR这个路径没有什么用,可能是重构前留下的脚本;
  2. BAGUA_DIR是perflow系统路径,也是重构遗留的;
yuyangJin commented 10 months ago

我尝试将BAGUA_DIR 改为perflow路径,运行example中的py脚本后报错:

ython3 communication_pattern_analysis.py /usr1/PerFlow/build/example/comm_pattern_analysis/cg.B.x-64p-20231219-102348/static_data/cg.B.x.pag 0.50user 0.01system 0:00.10elapsed 489%CPU (0avgtext+0avgdata 80264maxresident)k 0inputs+744outputs (15major+17432minor)pagefaults 0swaps original_GOMP_parallel = 0x7fcc94d268a0 SET sampling interval to 3100000 cycles PAPI_add_events(EventSet, (int *)Events, NUM_EVENTS), ErrCode: Component containing event is disabled PAPI_overflow(EventSet, PAPI_TOT_CYC, this->cyc_sample_count, 0, _papi_overflow_handler), ErrCode: Component Index isn't set PAPI_start(EventSet), ErrCode: Component Index isn't set srun: error: s_p_parse_file: unable to status file /etc/slurm-llnl/slurm.conf: No such file or directory, retrying in 1sec up to 60sec 请问这里需要什么操作呢

这里看起来静态分析成功了,但是运行时PAPI event不支持,您用papi_avail检查下可用的event。 需要改下系统参数/proc/sys/kernel/perf_event_paranoid置为0或1,请参考https://ptools-perfapi.eecs.utk.narkive.com/zqC46WJG/make-test-failed-papi-tot-cyc-is-not-available

ghostsea commented 10 months ago

感谢上面的回答,我已经尝试在PAPI正常的环境继续调试。

但是在example目录下尝试使用python3 pag_validation.py,看起来静态分析应该是正常执行了,但是在动态分析会出现

Abort(1142789) on node 0: Fatal error in internal_Comm_size: Invalid communicator, error stack:
internal_Comm_size(30769): MPI_Comm_size(comm=0x0, size=0x564ce11c9760) failed
internal_Comm_size(30723): Invalid communicator

类似的错误好多组。 最后有

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 100756 RUNNING AT x
=   EXIT CODE: 9
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
original_GOMP_parallel = 0x7f62d12508a0
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

由于这里失败,后面动态分析文件夹下面的文件均无生成,也就无法继续。 请问这种问题应该如何处理? 我所用的环境为x64 window平台的wsl下的ubutun虚拟机,版本为20.04。

另外是否方便提供一个仓库内在make之后正确执行一个样例的步骤方法?仓库内的test和example分别是如何工作的。

yuyangJin commented 10 months ago

感谢上面的回答,我已经尝试在PAPI正常的环境继续调试。

但是在example目录下尝试使用python3 pag_validation.py,看起来静态分析应该是正常执行了,但是在动态分析会出现

Abort(1142789) on node 0: Fatal error in internal_Comm_size: Invalid communicator, error stack:
internal_Comm_size(30769): MPI_Comm_size(comm=0x0, size=0x564ce11c9760) failed
internal_Comm_size(30723): Invalid communicator

类似的错误好多组。 最后有

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 100756 RUNNING AT x
=   EXIT CODE: 9
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
original_GOMP_parallel = 0x7f62d12508a0
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

由于这里失败,后面动态分析文件夹下面的文件均无生成,也就无法继续。 请问这种问题应该如何处理? 我所用的环境为x64 window平台的wsl下的ubutun虚拟机,版本为20.04。

另外是否方便提供一个仓库内在make之后正确执行一个样例的步骤方法?仓库内的test和example分别是如何工作的。

  1. 第一个错误我觉得应该是MPI wrapper的适配问题,我这里还没有对所有MPI库做适配,目前OpenMPI-4.x是适配的。
  2. ubuntu虚拟机我不确定是否可以完整做出实验,我目前还没有在ubuntu虚拟机上跑过。
  3. 我之前AE的时候写过一个readme文件,我找下push上去。