gpgpu-sim / gpgpu-sim_distribution

GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as well as a performance visualization tool, AerialVisoin, and an integrated energy model, GPUWattch.
Other
1.03k stars 486 forks source link

ptx_parse() fuction doesn't return when executing different applications #287

Open pipijing13 opened 7 months ago

pipijing13 commented 7 months ago

I met the same problem as which mentioned in https://github.com/gpgpu-sim/gpgpu-sim_distribution/issues/160 when executing applications using gpgpu-sim 4.0. The code stops executing after this point:

######### cuobjdump parser ######## Adding new section PTX Adding ptx filename: _cuobjdump_1.ptx Adding arch: sm_13 Adding identifier: vectoradd.cu Adding new section ELF Adding arch: sm_13 Adding identifier: vectoradd.cu Done parsing!!! GPGPU-Sim PTX: __cudaRegisterFunction _Z6vecAddPdS_S_i : hostFun 0x0x400c40, fat_cubin_handle = 1//stoped here

After I tracked the execution, I realized the ptx_parse() function in ptx_parser.cc doesn't return, which is also found in the former issue. I tried several different applications, including vectoradd and simple_add, and I got same results.

Then I tried to execute applications using gpgpu-sim 3.0, and former problem no longer exists. A file called _1.ptx was generated, and the following is the content of _1.ptx.

.version 1.4 .target sm_13

.file 1 "" .file 2 "/tmp/tmpxft_00003cca_00000000-6_vectoradd.cudafe2.gpu" .file 3 "/sciclone/home/ysun32/packages/gcc-4.5.1/lib/gcc/x86_64-unknown-linux-gnu/4.5.1/include/stddef.h" .file 4 "/sciclone/home/ysun32/packages/cuda4.2/cuda/bin/../include/crt/device_runtime.h" .file 5 "/sciclone/home/ysun32/packages/cuda4.2/cuda/bin/../include/host_defines.h" .file 6 "/sciclone/home/ysun32/packages/cuda4.2/cuda/bin/../include/builtin_types.h" .file 7 "/sciclone/home/ysun32/packages/cuda4.2/cuda/bin/../include/device_types.h" .file 8 "/sciclone/home/ysun32/packages/cuda4.2/cuda/bin/../include/driver_types.h" .file 9 "/sciclone/home/ysun32/packages/cuda4.2/cuda/bin/../include/surface_types.h" .file 10 "/sciclone/home/ysun32/packages/cuda4.2/cuda/bin/../include/texture_types.h" .file 11 "/sciclone/home/ysun32/packages/cuda4.2/cuda/bin/../include/vector_types.h" .file 12 "/sciclone/home/ysun32/packages/cuda4.2/cuda/bin/../include/device_launch_parameters.h" .file 13 "/sciclone/home/ysun32/packages/cuda4.2/cuda/bin/../include/crt/storage_class.h" .file 14 "vectoradd.cu" .file 15 "/sciclone/home/ysun32/packages/cuda4.2/cuda/bin/../include/common_functions.h" .file 16 "/sciclone/home/ysun32/packages/cuda4.2/cuda/bin/../include/math_functions.h" .file 17 "/sciclone/home/ysun32/packages/cuda4.2/cuda/bin/../include/math_constants.h" .file 18 "/sciclone/home/ysun32/packages/cuda4.2/cuda/bin/../include/device_functions.h" .file 19 "/sciclone/home/ysun32/packages/cuda4.2/cuda/bin/../include/sm_11_atomic_functions.h" .file 20 "/sciclone/home/ysun32/packages/cuda4.2/cuda/bin/../include/sm_12_atomic_functions.h" .file 21 "/sciclone/home/ysun32/packages/cuda4.2/cuda/bin/../include/sm_13_double_functions.h" .file 22 "/sciclone/home/ysun32/packages/cuda4.2/cuda/bin/../include/sm_20_atomic_functions.h" .file 23 "/sciclone/home/ysun32/packages/cuda4.2/cuda/bin/../include/sm_20_intrinsics.h" .file 24 "/sciclone/home/ysun32/packages/cuda4.2/cuda/bin/../include/sm_30_intrinsics.h" .file 25 "/sciclone/home/ysun32/packages/cuda4.2/cuda/bin/../include/surface_functions.h" .file 26 "/sciclone/home/ysun32/packages/cuda4.2/cuda/bin/../include/texture_fetch_functions.h" .file 27 "/sciclone/home/ysun32/packages/cuda4.2/cuda/bin/../include/math_functions_dbl_ptx3.h"

.entry _Z6vecAddPdS_S_i ( .param .u64 cudaparmZ6vecAddPdS_S_i_a, .param .u64 cudaparmZ6vecAddPdS_S_i_b, .param .u64 cudaparmZ6vecAddPdS_S_i_c, .param .s32 cudaparmZ6vecAddPdS_S_i_n) { .reg .u16 %rh<4>; .reg .u32 %r<6>; .reg .u64 %rd<10>; .reg .f64 %fd<5>; .reg .pred %p<3>; .loc 14 21 0 $LDWbeginZ6vecAddPdS_S_i: mov.u16 %rh1, %ctaid.x; mov.u16 %rh2, %ntid.x; mul.wide.u16 %r1, %rh1, %rh2; cvt.u32.u16 %r2, %tid.x; add.u32 %r3, %r2, %r1; ld.param.s32 %r4, [cudaparmZ6vecAddPdS_S_i_n]; setp.le.s32 %p1, %r4, %r3; @%p1 bra $Lt_0_1026; .loc 14 26 0 cvt.s64.s32 %rd1, %r3; mul.wide.s32 %rd2, %r3, 8; ld.param.u64 %rd3, [cudaparmZ6vecAddPdS_S_i_a]; add.u64 %rd4, %rd3, %rd2; ld.global.f64 %fd1, [%rd4+0]; ld.param.u64 %rd5, [cudaparmZ6vecAddPdS_S_i_b]; add.u64 %rd6, %rd5, %rd2; ld.global.f64 %fd2, [%rd6+0]; add.f64 %fd3, %fd1, %fd2; ld.param.u64 %rd7, [cudaparmZ6vecAddPdS_S_i_c]; add.u64 %rd8, %rd7, %rd2; st.global.f64 [%rd8+0], %fd3; $Lt_0_1026: .loc 14 27 0 exit; $LDWendZ6vecAddPdS_S_i: }

Then I copied _1.ptx to the folder under gpgpu-sim 4.0, and I can successfully execute the applications using gpgpu-sim 4.0 without getting stuck. So I think maybe it stopped at the point generating _1.ptx in the beginning, and if _1.ptx exists, it won't get stuck at ptx_parse() function. But I am not sure whether it is acceptable to use the _1.ptx file generated in gpgpu-sim 3.0. Could anyone help me with this? Besides, the following is the output of execution in this way.

######### cuobjdump parser ######## Adding new section PTX Adding ptx filename: _cuobjdump_1.ptx Adding arch: sm_13 Adding identifier: vectoradd.cu Adding new section ELF Adding arch: sm_13 Adding identifier: vectoradd.cu Done parsing!!! GPGPU-Sim PTX: __cudaRegisterFunction _Z6vecAddPdS_S_i : hostFun 0x0x400c40, fat_cubin_handle = 1//stopped here before GPGPU-Sim PTX: instruction assembly for function '_Z6vecAddPdS_S_i'... done. GPGPU-Sim PTX: Warning %rh0 was declared previous at _1.ptx:55 skipping new declaration GPGPU-Sim PTX: Warning %rh1 was declared previous at _1.ptx:55 skipping new declaration GPGPU-Sim PTX: Warning %rh2 was declared previous at _1.ptx:55 skipping new declaration GPGPU-Sim PTX: Warning %rh3 was declared previous at _1.ptx:55 skipping new declaration GPGPU-Sim PTX: Warning %r0 was declared previous at _1.ptx:56 skipping new declaration GPGPU-Sim PTX: Warning %r1 was declared previous at _1.ptx:56 skipping new declaration GPGPU-Sim PTX: Warning %r2 was declared previous at _1.ptx:56 skipping new declaration GPGPU-Sim PTX: Warning %r3 was declared previous at _1.ptx:56 skipping new declaration GPGPU-Sim PTX: Warning %r4 was declared previous at _1.ptx:56 skipping new declaration GPGPU-Sim PTX: Warning %r5 was declared previous at _1.ptx:56 skipping new declaration GPGPU-Sim PTX: Warning %rd0 was declared previous at _1.ptx:57 skipping new declaration GPGPU-Sim PTX: Warning %rd1 was declared previous at _1.ptx:57 skipping new declaration GPGPU-Sim PTX: Warning %rd2 was declared previous at _1.ptx:57 skipping new declaration GPGPU-Sim PTX: Warning %rd3 was declared previous at _1.ptx:57 skipping new declaration GPGPU-Sim PTX: Warning %rd4 was declared previous at _1.ptx:57 skipping new declaration GPGPU-Sim PTX: Warning %rd5 was declared previous at _1.ptx:57 skipping new declaration GPGPU-Sim PTX: Warning %rd6 was declared previous at _1.ptx:57 skipping new declaration GPGPU-Sim PTX: Warning %rd7 was declared previous at _1.ptx:57 skipping new declaration GPGPU-Sim PTX: Warning %rd8 was declared previous at _1.ptx:57 skipping new declaration GPGPU-Sim PTX: Warning %rd9 was declared previous at _1.ptx:57 skipping new declaration GPGPU-Sim PTX: Warning %fd0 was declared previous at _1.ptx:58 skipping new declaration GPGPU-Sim PTX: Warning %fd1 was declared previous at _1.ptx:58 skipping new declaration GPGPU-Sim PTX: Warning %fd2 was declared previous at _1.ptx:58 skipping new declaration GPGPU-Sim PTX: Warning %fd3 was declared previous at _1.ptx:58 skipping new declaration GPGPU-Sim PTX: Warning %fd4 was declared previous at _1.ptx:58 skipping new declaration GPGPU-Sim PTX: Warning %p0 was declared previous at _1.ptx:59 skipping new declaration GPGPU-Sim PTX: Warning %p1 was declared previous at _1.ptx:59 skipping new declaration GPGPU-Sim PTX: Warning %p2 was declared previous at _1.ptx:59 skipping new declaration GPGPU-Sim PTX: instruction assembly for function '_Z6vecAddPdS_S_i'... done. GPGPU-Sim PTX: finished parsing EMBEDDED .ptx file _1.ptx self exe links to: vectoradd GPGPU-Sim PTX: extracting embedded .ptx to temporary file "_ptx_xnYjaY" Running: cat _ptx_xnYjaY | sed 's/.version 1.5/.version 1.4/' | sed 's/, texmode_independent//' | sed 's/(.extern .const[1] .b8 \w+)[]/\1[1]/' | sed 's/const[.]/const[0]/g' > _ptx2_TklIcI GPGPU-Sim PTX: generating ptxinfo using "$CUDA_INSTALL_PATH/bin/ptxas --gpu-name=sm_13 -v _ptx2_TklIcI --output-file /dev/null 2> _ptx_xnYjaYinfo"

What's more, I noticed if I use the command "exit" while getting stuck at ptx_parse() function, I can skip this problem and continue executing the application, but it won't generate _1.ptx. The following is the output of execution if I use "exit". Similarly, I am also not sure whether it is acceptable if I use "exit", so I really appreciate if anyone know something about it.

######### cuobjdump parser ######## Adding new section PTX Adding ptx filename: _cuobjdump_1.ptx Adding arch: sm_13 Adding identifier: vectoradd.cu Adding new section ELF Adding arch: sm_13 Adding identifier: vectoradd.cu Done parsing!!! GPGPU-Sim PTX: __cudaRegisterFunction _Z6vecAddPdS_S_i : hostFun 0x0x400c40, fat_cubin_handle = 1//stopped here before exit _1.ptx:1 Syntax error:

^

GPGPU-Sim PTX: instruction assembly for function '_Z6vecAddPdS_S_i'... done. GPGPU-Sim PTX: finished parsing EMBEDDED .ptx file _1.ptx self exe links to: vectoradd GPGPU-Sim PTX: extracting embedded .ptx to temporary file "_ptx_w9yRta" Running: cat _ptx_w9yRta | sed 's/.version 1.5/.version 1.4/' | sed 's/, texmode_independent//' | sed 's/(.extern .const[1] .b8 \w+)[]/\1[1]/' | sed 's/const[.]/const[0]/g' > _ptx2_XACqHc GPGPU-Sim PTX: generating ptxinfo using "$CUDA_INSTALL_PATH/bin/ptxas --gpu-name=sm_13 -v _ptx2_XACqHc --output-file /dev/null 2> _ptx_w9yRtainfo"

quadpixels commented 4 months ago

I encountered this error too, and I found it was because I had an extra 0x00 at the end of the PTX byte file. Maybe you can check your file as well?

pipijing13 commented 4 months ago

I encountered this error too, and I found it was because I had an extra 0x00 at the end of the PTX byte file. Maybe you can check your file as well?

Thank you so much! I will try it. Do you mean the file named "_cuobjdump_1.ptx" generated in the application folder?

yao1yao1yao1 commented 4 months ago

I just started using gpgpu-sim recently, how do I execute different applications on gpgpu-sim, can you give the complete code to run ispass-2009 RAY and BFS at the same time