Open sleepwalker2017 opened 3 months ago
proton should be included by default in recent releases. Make sure you use an up-to-date version of triton.
See here for some proton documentation: https://github.com/triton-lang/triton/tree/main/third_party/proton
proton should be included by default in recent releases. Make sure you use an up-to-date version of triton.
See here for some proton documentation: https://github.com/triton-lang/triton/tree/main/third_party/proton
Thank you. I install The latest triton nightly version,and it's ok now.
But what's the output of proton?
I run this:
python 09-persistent-matmul.py
proton-viewer -m time/s matmul.hatchet
Warning: Roundtrip module could not be loaded. Requires jupyter notebook version <= 7.x.
0.003 ROOT
├─ 0.000 _ZN2at6native18elementwise_kernelILi128ELi4EZNS0_22gpu_kernel_impl_nocastIZNS0_23float8_copy_kernel_cudaERNS_18TensorIteratorBaseEEUlN3c1013Float8_e4m3fnEE_EEvS4_RKT_EUliE_EEviT1_
├─ 0.000 _ZN2at6native29vectorized_elementwise_kernelILi4ENS0_11FillFunctorIN3c1013Float8_e4m3fnEEENS_6detail5ArrayIPcLi1EEEEEviT0_T1_
├─ 0.000 _ZN2at6native29vectorized_elementwise_kernelILi4EZNS0_23float8_copy_kernel_cudaERNS_18TensorIteratorBaseEEUlN3c104HalfEE_NS_6detail5ArrayIPcLi2EEEEEviT0_T1_
├─ 0.000 _ZN2at6native54_GLOBAL__N__d8ceb000_21_DistributionNormal_cu_0c5b6e8543distribution_elementwise_grid_stride_kernelIfLi4EZNS0_9templates4cuda20normal_and_transformIN3c104HalfEfLm4EPNS_17CUDAGeneratorImplEZZZNS4_13normal_kernelIS9_EEvRKNS_10TensorBaseEddT_ENKUlvE_clEvENKUlvE1_clEvEUlfE_EEvRNS_18TensorIteratorBaseET2_T3_EUlP24curandStatePhilox4_32_10E0_ZNS1_27distribution_nullary_kernelIS7_fLi4ES9_SO_SH_EEvSJ_SK_RKSL_T4_EUlifE_EEviNS_15PhiloxCudaStateET1_SK_
├─ 0.000 cublas M=256, N=5120, K=13824
│ └─ 0.000 sm90_xmma_gemm_e4m3e4m3_e4m3f32_f32_tn_n_tilesize128x128x128_warpgroupsize1x1x1_bias_f16_execute_segment_k_off_kernel__5x_cublas
├─ 0.000 flush_TMA_cache
├─ 0.001 matmul_kernel [M=256, N=5120, K=13824]
├─ 0.001 matmul_kernel_persistent [M=256, N=5120, K=13824]
└─ 0.001 matmul_kernel_tma_persistent [M=256, N=5120, K=13824]
Legend (Metric: time/s (inc) Min: 0.00 Max: 0.00)
█ 0.00 - 0.00
█ 0.00 - 0.00
█ 0.00 - 0.00
█ 0.00 - 0.00
█ 0.00 - 0.00
█ 0.00 - 0.00
Try time/ns
or time/ms
How to install proton using pip ?
I try to run the 09-persistent-matmul.py example, but it complains.
How to fix it?
And why using this to benchmark kernels? What's its output? I didn't see it in the documentation.