Open microyybar opened 1 year ago
Thanks for the issue @microyybar, but the description is not clear of what is happening. Is it something that is happening at compilation time? Is there any context of system, logs, environment, etc?
The same "ptxas error: stack size ... cannot be statistically determined" appears in my environments (in both cuml 23.06 and 23.04). It appears in Debug compilation mode only. I was able to compile Release mode successfully. Below is the error message:
[147/269] Building CUDA object CMakeFiles/cuml++.dir/src/kmeans/kmeans_transform.cu.o FAILED: CMakeFiles/cuml++.dir/src/kmeans/kmeans_transform.cu.o /usr/local/cuda-11.6/bin/nvcc -forward-unknown-to-host-compiler -DCUML_CPP_API -DCUML_ENABLE_GPU -DCUTLASS_NAMESPACE=raft_cutlass -DDISABLE_CUSPARSE_DEPRECATED -DFMT_HEADER_ONLY=1 -DFMT_SHARED -DRAFT_COMPILED -DRAFT_SYSTEM_LITTLE_ENDIAN=1 -DSPDLOG_FMT_EXTERNAL -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CUDA -DTHRUST_HOST_SYSTEM=THRUST_HOST_SYSTEM_CPP -Dcuml___EXPORTS -I/home/jinfengl/project/study/cuml/cpp/include -I/home/jinfengl/project/study/cuml/cpp/src -I/home/jinfengl/project/study/cuml/cpp/src/metrics -I/home/jinfengl/project/study/cuml/cpp/src_prims -I/home/jinfengl/miniconda3/envs/cuml_dev_conda/include/rapids -I/home/jinfengl/miniconda3/envs/cuml_dev_conda/include/rapids/libcudacxx -I/home/jinfengl/project/study/cuml/cpp/build/_deps/gputreeshap-src -isystem /home/jinfengl/miniconda3/envs/cuml_dev_conda/include -isystem /usr/local/cuda-11.6/include -g -std=c++17 --generate-code=arch=compute_75,code=[sm_75] --generate-code=arch=compute_86,code=[sm_86] -Xcompiler=-fPIC --expt-extended-lambda --expt-relaxed-constexpr -Werror=all-warnings -Xcompiler=-Wall,-Werror,-Wno-error=deprecated-declarations,-Wno-error=sign-compare -Wno-deprecated-declarations -Xcompiler=-Wno-deprecated-declarations -Xfatbin=-compress-all -Xcompiler=-fopenmp -G -Xcompiler=-rdynamic -Xcompiler -pthread -MD -MT CMakeFiles/cuml++.dir/src/kmeans/kmeans_transform.cu.o -MF CMakeFiles/cuml++.dir/src/kmeans/kmeans_transform.cu.o.d -x cu -c /home/jinfengl/project/study/cuml/cpp/src/kmeans/kmeans_transform.cu -o CMakeFiles/cuml++.dir/src/kmeans/kmeans_transform.cu.o ptxas error : Stack size for entry function '_ZN12raft_cutlass6KernelINS_4gemm6kernel21GemmWithFusedEpilogueINS1_11threadblock13MmaMultistageINS1_9GemmShapeILi128ELi128ELi16EEENS_9transform11threadblock28PredicatedTileAccessIteratorINS_11MatrixShapeILi128ELi16EEEfNS_6layout11ColumnMajorELi1ENS8_29PitchLinearWarpRakedThreadMapINS_16PitchLinearShapeILi128ELi16EEELi128ENSG_ILi8ELi4EEELi4EEENS_5ArrayIfLi2ELb1EEELb0EEENS9_25RegularTileAccessIteratorISC_fNSD_40ColumnMajorTensorOpMultiplicandCongruousILi32ELi32EEELi1ESJ_Li16EEELNS_4arch14CacheOperation4KindE0ENSA_INSB_ILi16ELi128EEEfNSD_8RowMajorELi0ESJ_SL_Lb0EEENSN_ISU_fNSD_37RowMajorTensorOpMultiplicandCongruousILi32ELi32EEELi0ESJ_Li16EEELST_0EfSV_NS4_9MmaPolicyINS1_4warp18MmaTensorOpFastF32INS6_ILi64ELi64ELi16EEEfSP_fSY_fSV_NS11_17MmaTensorOpPolicyINSR_3MmaINS6_ILi16ELi8ELi4EEELi32ENS_10tfloat32_tESV_S17_SE_fSV_NSR_13OpMultiplyAddEEENSB_ILi1ELi1EEEEELi1ELb0EbEENSB_ILi0ELi0EEES1D_Li1EEELi3ELNS1_23SharedMemoryClearOptionE0EbEENS_8epilogue11threadblock21EpilogueWithBroadcastIS7_S1C_Li1ENS1I_29PredicatedTileIteratorNormVecINS1I_26OutputTileOptimalThreadMapINS1I_15OutputTileShapeILi128ELi8ELi2ELi1ELi1EEENS1M_ILi1ELi8ELi1ELi1ELi8EEELi128ELi1ELi32EEEfSE_Lb0ELb0EEENS1I_22PredicatedTileIteratorIS1P_fLb0ELb0EEEfNS1H_4warp24FragmentIteratorTensorOpIS13_S16_fNSK_IfLi4ELb1EEESV_EENS1T_20TileIteratorTensorOpIS13_S16_fSV_EENS1I_18SharedLoadIteratorINS1P_18CompactedThreadMapEfLi4EEENS1H_6thread35PairwiseDistanceEpilogueElementwiseIfffffLi1EN4raft8distance6detail3ops17l2_exp_cutlass_opIffEENS24_11identity_opEEENSB_ILi0ELi8EEELi1ELi1EEENS4_30GemmIdentityThreadblockSwizzleILi1EEEEEEEvNT_6ParamsE' cannot be statically determined
cuml22.12 run ./build.sh libcuml -g