gpgpu-sim / cutlass-gpgpu-sim

22 stars 11 forks source link

setup issue #5

Open acejim opened 5 years ago

acejim commented 5 years ago

Hi there,

I have successfully run gpgpu-sim, and followed the steps to run cutlass-test with gpgpu-sim, but I got the output like this:

GPGPU-Sim PTX: cudaRegisterFatBinary, fat_cubin_handle = 2, filename=default GPGPU-Sim PTX: cudaRegisterFunction nv_static_7966_tmpxft_00002dac_00000000_12_cuda_device_runtime_compute_70_cpp1_ii_8b1a5d37Z16memcpy_3d_deviceImLi1ELi1ELi1EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1S2 : hostFun 0x0x4017f0, fat_cubin_handle = 2 GPGPU-Sim PTX: Parsing cutlass-test.1.sm_70.ptx GPGPU-Sim PTX: finished parsing EMBEDDED .ptx file cutlass-test.1.sm_70.ptx GPGPU-Sim PTX: loading globals with explicit initializers... GPGPU-Sim PTX: finished loading globals (0 bytes total). GPGPU-Sim PTX: loading constants with explicit initializers... done. GPGPU-Sim PTX: Loading PTXInfo from cutlass-test.1.sm_70.ptx Warning: cannot find deviceFun nv_static_7966_tmpxft_00002dac_00000000_12_cuda_device_runtime_compute_70_cpp1_ii_8b1a5d37Z16memcpy_3d_deviceImLi1ELi1ELi1EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1S2 GPGPU-Sim PTX: cudaRegisterFunction nv_static_7966_tmpxft_00002dac_00000000_12_cuda_device_runtime_compute_70_cpp1_ii_8b1a5d37__Z16memset_3d_deviceIjLi0ELi0ELi1EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1S0 : hostFun 0x0x405dc0, fat_cubin_handle = 2 Warning: cannot find deviceFun nv_static_7966_tmpxft_00002dac_00000000_12_cuda_device_runtime_compute_70_cpp1_ii_8b1a5d37Z16memset_3d_deviceIjLi0ELi0ELi1EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1S0 GPGPU-Sim PTX: cudaRegisterFunction nv_static_7966_tmpxft_00002dac_00000000_12_cuda_device_runtime_compute_70_cpp1_ii_8b1a5d37Z16memset_3d_deviceIjLi0ELi0ELi0EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1S0 : hostFun 0x0x405fe0, fat_cubin_handle = 2 Warning: cannot find deviceFun nv_static_7966_tmpxft_00002dac_00000000_12_cuda_device_runtime_compute_70_cpp1_ii_8b1a5d37Z16memset_3d_deviceIjLi0ELi0ELi0EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1S0 GPGPU-Sim PTX: cudaRegisterVar: hostVar = 0x693160; deviceAddress = nv_static_7966_tmpxft_00002dac_00000000_12_cuda_device_runtime_compute_70_cpp1_ii_8b1a5d37_set_kernel32; deviceName = nv_static_7966_tmpxft_00002dac_00000000_12_cuda_device_runtime_compute_70_cpp1_ii_8b1a5d37_set_kernel32 GPGPU-Sim PTX: cudaRegisterVar: Registering const memory space of 64 bytes GPGPU-Sim PTX registering constant nv_static_7966_tmpxft_00002dac_00000000_12_cuda_device_runtime_compute_70_cpp1_ii_8b1a5d37_set_kernel32 (64 bytes) to name mapping GPGPU-Sim PTX: cudaRegisterVar: hostVar = 0x6931a0; deviceAddress = nv_static_7966_tmpxft_00002dac_00000000_12_cuda_device_runtime_compute_70_cpp1_ii_8b1a5d37_set_kernel64; deviceName = nv_static_7966_tmpxft_00002dac_00000000_12_cuda_device_runtime_compute_70_cpp1_ii_8b1a5d37_set_kernel64 GPGPU-Sim PTX: cudaRegisterVar: Registering const memory space of 64 bytes GPGPU-Sim PTX registering constant nv_static_7966_tmpxft_00002dac_00000000_12_cuda_device_runtime_compute_70_cpp1_ii_8b1a5d37_set_kernel64 (64 bytes) to name mapping GPGPU-Sim PTX: cudaRegisterVar: hostVar = 0x6931e0; deviceAddress = nv_static_7966_tmpxft_00002dac_00000000_12_cuda_device_runtime_compute_70_cpp1_ii_8b1a5d37_cpy_kernel32; deviceName = nv_static_7966_tmpxft_00002dac_00000000_12_cuda_device_runtime_compute_70_cpp1_ii_8b1a5d37_cpy_kernel32 GPGPU-Sim PTX: cudaRegisterVar: Registering const memory space of 64 bytes GPGPU-Sim PTX registering constant nv_static_7966_tmpxft_00002dac_00000000_12_cuda_device_runtime_compute_70_cpp1_ii_8b1a5d37_cpy_kernel32 (64 bytes) to name mapping GPGPU-Sim PTX: cudaRegisterVar: hostVar = 0x693220; deviceAddress = nv_static_7966_tmpxft_00002dac_00000000_12_cuda_device_runtime_compute_70_cpp1_ii_8b1a5d37_cpy_kernel64; deviceName = nv_static_7966_tmpxft_00002dac_00000000_12_cuda_device_runtime_compute_70_cpp1_ii_8b1a5d37_cpy_kernel64 GPGPU-Sim PTX: cudaRegisterVar: Registering const memory space of 64 bytes GPGPU-Sim PTX registering constant nv_static_7966_tmpxft_00002dac_00000000_12_cuda_device_runtime_compute_70_cpp1_ii_8b1a5d37_cpy_kernel64 (64 bytes) to name mapping GPGPU-Sim PTX: cudaRegisterVar: hostVar = 0x692640; deviceAddress = nv_static_79__66_tmpxft_00002dac_00000000_12_cuda_device_runtime_compute_70_cpp1_ii_8b1a5d37_cudartErrorTableArr; deviceName = nv_static_7966_tmpxft_00002dac_00000000_12_cuda_device_runtime_compute_70_cpp1_ii_8b1a5d37_cudartErrorTableArr GPGPU-Sim PTX: cudaRegisterVar: Registering const memory space of 1992 bytes GPGPU-Sim PTX registering global nv_static_79__66_tmpxft_00002dac_00000000_12_cuda_device_runtime_compute_70_cpp1_ii_8b1a5d37_cudartErrorTableArr hostVar to name mapping GPGPU-Sim PTX: cudaRegisterVar: hostVar = 0x693140; deviceAddress = cudartErrorTable; deviceName = cudartErrorTable GPGPU-Sim PTX: cudaRegisterVar: Registering const memory space of 8 bytes GPGPU-Sim PTX registering global cudartErrorTable hostVar to name mapping GPGPU-Sim PTX: cudaRegisterVar: hostVar = 0x409000; deviceAddress = cudartErrorTableEntryCount; deviceName = cudartErrorTableEntryCount GPGPU-Sim PTX: cudaRegisterVar: Registering const memory space of 4 bytes GPGPU-Sim PTX registering global cudartErrorTableEntryCount hostVar to name mapping GPGPU-Sim PTX: cudaRegisterVar: hostVar = 0x409020; deviceAddress = nv_static_7966_tmpxft_00002dac_00000000_12_cuda_device_runtime_compute_70_cpp1_ii_8b1a5d37_cudartErrorCnpMapArr; deviceName = nv_static_7966_tmpxft_00002dac_00000000_12_cuda_device_runtime_compute_70_cpp1_ii_8b1a5d37_cudartErrorCnpMapArr GPGPU-Sim PTX: cudaRegisterVar: Registering const memory space of 104 bytes GPGPU-Sim PTX registering global nv_static_79__66_tmpxft_00002dac_00000000_12_cuda_device_runtime_compute_70_cpp1_ii_8b1a5d37_cudartErrorCnpMapArr hostVar to name mapping GPGPU-Sim PTX: cudaRegisterVar: hostVar = 0x693148; deviceAddress = cudartErrorCnpMap; deviceName = cudartErrorCnpMap GPGPU-Sim PTX: cudaRegisterVar: Registering const memory space of 8 bytes GPGPU-Sim PTX registering global cudartErrorCnpMap hostVar to name mapping GPGPU-Sim PTX: cudaRegisterVar: hostVar = 0x409004; deviceAddress = cudartErrorCnpMapEntryCount; deviceName = cudartErrorCnpMapEntryCount GPGPU-Sim PTX: cudaRegisterVar: Registering const memory space of 4 bytes GPGPU-Sim PTX registering global cudartErrorCnpMapEntryCount hostVar to name mapping GPGPU-Sim PTX: cudaRegisterVar: hostVar = 0x693150; deviceAddress = CNPRT_VERSION_NUMBER; deviceName = CNPRT_VERSION_NUMBER GPGPU-Sim PTX: cudaRegisterVar: Registering const memory space of 4 bytes GPGPU-Sim PTX registering global CNPRT_VERSION_NUMBER hostVar to name mapping GPGPU-Sim: exit detected

Could you tell me how to solve it? Thank you.

damionfan commented 3 years ago

use CFLAGS

sunlex0717 commented 3 years ago

Hi I also got this error, have you fixed this?

use CFLAGS

sxzhang1993 commented 3 years ago

Hello, have you resolved this issue?

jielahou commented 1 week ago

First, should use CUDA 9.0 not CUDA 11.0, and downgrade GCC & G++(So I directly use ubuntu 16.04, whose default GCC version is 5.4.0). Then there will no errors like above Warning: cannot find deviceFun. It still has... But it seems that it doesn't affect simulation.

After change to CUDA 9.0, when run this repo's cutlass, there may be error like:

cutlass-test.1.sm_70.ptx:347 Syntax error:

   wmma.load.a.sync.col.m16n16k16.f16 {%r590, %r589, %r588, %r587, %r586, %r585, %r584, %r583}, [%rd66], %r374;

My solution is to modify the src/cuda-sim/ptx.l, Line 172:

<INITIAL,NOT_OPCODE,IN_INST,IN_FUNC_DECL>{
\.a\.sync\.aligned TC; yylval->int_value = LOAD_A; return WMMA_DIRECTIVE;
\.b\.sync\.aligned TC; yylval->int_value = LOAD_B; return WMMA_DIRECTIVE;
\.c\.sync\.aligned TC; yylval->int_value = LOAD_C; return WMMA_DIRECTIVE;
\.d\.sync\.aligned TC; yylval->int_value = STORE_D; return WMMA_DIRECTIVE;
+ \.a\.sync TC; yylval->int_value = LOAD_A; return WMMA_DIRECTIVE;
+ \.b\.sync TC; yylval->int_value = LOAD_B; return WMMA_DIRECTIVE;
+ \.c\.sync TC; yylval->int_value = LOAD_C; return WMMA_DIRECTIVE;
+ \.d\.sync TC; yylval->int_value = STORE_D; return WMMA_DIRECTIVE;
\.mma\.sync\.aligned TC;yylval->int_value=MMA; return WMMA_DIRECTIVE;
+ \.mma\.sync TC;yylval->int_value=MMA; return WMMA_DIRECTIVE;

Then it can run in my environment.