Open LeiWang1999 opened 1 year ago
Add Support for int16_t load ( bloom fp16 model
for Register fusion pass (welder) fused node with multiple outputs, current code makes a wrong assignment of output edge, which will cause mistakes in some cases.
in current way of
-gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86
if we wanna use some features which must be in sm_86, we should comment the low cuda arch gencode flag, otherwise we will get an compilation error.
ptxas /tmp/tmpxft_0000e00e_00000000-11_nnfusion_rt.compute_60.ptx, line 43059; error : Feature '.m16n8k16' requires .target sm_80 or higher
with the new CUDA_ARCH SET way
SET(CUDA_ARCH "-gencode=arch=compute_60,code=compute_60 -gencode=arch=compute_61,code=compute_61 -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_80,code=compute_80" CACHE STRING "target architecture")
we no longer have this concern.
void cuda::FusionCudaEmitter::set_launch_config() { auto block = m_fusion_group["block_size"]; auto grid = m_fusion_group["grid_size"]; block[0].get_to(m_blockDim.x); block[1].get_to(m_blockDim.y); block[2].get_to(m_blockDim.z); grid[0].get_to(m_gridDim.x); grid[1].get_to(m_gridDim.y); grid[1].get_to(m_gridDim.z); }
should be grid[2].get_to(m_gridDim.z);
grid[2].get_to(m_gridDim.z);
Add Support for int16_t load ( bloom fp16 model
for Register fusion pass (welder) fused node with multiple outputs, current code makes a wrong assignment of output edge, which will cause mistakes in some cases.
3. re-write the CUDA_ARCH string in Cuda Codegen CMakeList.txt in a more friendly way.
in current way of
if we wanna use some features which must be in sm_86, we should comment the low cuda arch gencode flag, otherwise we will get an compilation error.
with the new CUDA_ARCH SET way
we no longer have this concern.
should be
grid[2].get_to(m_gridDim.z);