rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.05k stars 522 forks source link

ptxas error : Stack size for entry function..cannot be statically determined #5047

Open microyybar opened 1 year ago

microyybar commented 1 year ago

cuml22.12 run ./build.sh libcuml -g

dantegd commented 1 year ago

Thanks for the issue @microyybar, but the description is not clear of what is happening. Is it something that is happening at compilation time? Is there any context of system, logs, environment, etc?

lijinf2 commented 1 year ago

The same "ptxas error: stack size ... cannot be statistically determined" appears in my environments (in both cuml 23.06 and 23.04). It appears in Debug compilation mode only. I was able to compile Release mode successfully. Below is the error message:

[147/269] Building CUDA object CMakeFiles/cuml++.dir/src/kmeans/kmeans_transform.cu.o FAILED: CMakeFiles/cuml++.dir/src/kmeans/kmeans_transform.cu.o /usr/local/cuda-11.6/bin/nvcc -forward-unknown-to-host-compiler -DCUML_CPP_API -DCUML_ENABLE_GPU -DCUTLASS_NAMESPACE=raft_cutlass -DDISABLE_CUSPARSE_DEPRECATED -DFMT_HEADER_ONLY=1 -DFMT_SHARED -DRAFT_COMPILED -DRAFT_SYSTEM_LITTLE_ENDIAN=1 -DSPDLOG_FMT_EXTERNAL -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CUDA -DTHRUST_HOST_SYSTEM=THRUST_HOST_SYSTEM_CPP -Dcuml___EXPORTS -I/home/jinfengl/project/study/cuml/cpp/include -I/home/jinfengl/project/study/cuml/cpp/src -I/home/jinfengl/project/study/cuml/cpp/src/metrics -I/home/jinfengl/project/study/cuml/cpp/src_prims -I/home/jinfengl/miniconda3/envs/cuml_dev_conda/include/rapids -I/home/jinfengl/miniconda3/envs/cuml_dev_conda/include/rapids/libcudacxx -I/home/jinfengl/project/study/cuml/cpp/build/_deps/gputreeshap-src -isystem /home/jinfengl/miniconda3/envs/cuml_dev_conda/include -isystem /usr/local/cuda-11.6/include -g -std=c++17 --generate-code=arch=compute_75,code=[sm_75] --generate-code=arch=compute_86,code=[sm_86] -Xcompiler=-fPIC --expt-extended-lambda --expt-relaxed-constexpr -Werror=all-warnings -Xcompiler=-Wall,-Werror,-Wno-error=deprecated-declarations,-Wno-error=sign-compare -Wno-deprecated-declarations -Xcompiler=-Wno-deprecated-declarations -Xfatbin=-compress-all -Xcompiler=-fopenmp -G -Xcompiler=-rdynamic -Xcompiler -pthread -MD -MT CMakeFiles/cuml++.dir/src/kmeans/kmeans_transform.cu.o -MF CMakeFiles/cuml++.dir/src/kmeans/kmeans_transform.cu.o.d -x cu -c /home/jinfengl/project/study/cuml/cpp/src/kmeans/kmeans_transform.cu -o CMakeFiles/cuml++.dir/src/kmeans/kmeans_transform.cu.o ptxas error : Stack size for entry function '_ZN12raft_cutlass6KernelINS_4gemm6kernel21GemmWithFusedEpilogueINS1_11threadblock13MmaMultistageINS1_9GemmShapeILi128ELi128ELi16EEENS_9transform11threadblock28PredicatedTileAccessIteratorINS_11MatrixShapeILi128ELi16EEEfNS_6layout11ColumnMajorELi1ENS8_29PitchLinearWarpRakedThreadMapINS_16PitchLinearShapeILi128ELi16EEELi128ENSG_ILi8ELi4EEELi4EEENS_5ArrayIfLi2ELb1EEELb0EEENS9_25RegularTileAccessIteratorISC_fNSD_40ColumnMajorTensorOpMultiplicandCongruousILi32ELi32EEELi1ESJ_Li16EEELNS_4arch14CacheOperation4KindE0ENSA_INSB_ILi16ELi128EEEfNSD_8RowMajorELi0ESJ_SL_Lb0EEENSN_ISU_fNSD_37RowMajorTensorOpMultiplicandCongruousILi32ELi32EEELi0ESJ_Li16EEELST_0EfSV_NS4_9MmaPolicyINS1_4warp18MmaTensorOpFastF32INS6_ILi64ELi64ELi16EEEfSP_fSY_fSV_NS11_17MmaTensorOpPolicyINSR_3MmaINS6_ILi16ELi8ELi4EEELi32ENS_10tfloat32_tESV_S17_SE_fSV_NSR_13OpMultiplyAddEEENSB_ILi1ELi1EEEEELi1ELb0EbEENSB_ILi0ELi0EEES1D_Li1EEELi3ELNS1_23SharedMemoryClearOptionE0EbEENS_8epilogue11threadblock21EpilogueWithBroadcastIS7_S1C_Li1ENS1I_29PredicatedTileIteratorNormVecINS1I_26OutputTileOptimalThreadMapINS1I_15OutputTileShapeILi128ELi8ELi2ELi1ELi1EEENS1M_ILi1ELi8ELi1ELi1ELi8EEELi128ELi1ELi32EEEfSE_Lb0ELb0EEENS1I_22PredicatedTileIteratorIS1P_fLb0ELb0EEEfNS1H_4warp24FragmentIteratorTensorOpIS13_S16_fNSK_IfLi4ELb1EEESV_EENS1T_20TileIteratorTensorOpIS13_S16_fSV_EENS1I_18SharedLoadIteratorINS1P_18CompactedThreadMapEfLi4EEENS1H_6thread35PairwiseDistanceEpilogueElementwiseIfffffLi1EN4raft8distance6detail3ops17l2_exp_cutlass_opIffEENS24_11identity_opEEENSB_ILi0ELi8EEELi1ELi1EEENS4_30GemmIdentityThreadblockSwizzleILi1EEEEEEEvNT_6ParamsE' cannot be statically determined