mit-han-lab / torchsparse

[MICRO'23, MLSys'22] TorchSparse: Efficient Training and Inference Framework for Sparse Convolution on GPUs.
https://torchsparse.mit.edu
MIT License
1.22k stars 143 forks source link

[BUG] <numBlocks in Y dimension is larger than needed for FetchOnDemand_no_fusion> #323

Open yokosyun opened 3 months ago

yokosyun commented 3 months ago

Is there an existing issue for this?

Current Behavior

fetch_on_demand_gemm_no_fusion have wrong numBlocks in Y dim. Thus there is unnecessary Block execution.

cur_nnz is divided by 16(BLOCK_SIZE)

fetch_on_demand_gemm_no_fusion_fp32_1<16, 4, 8>
            <<<dim3(DIV_UP(out_channel, 16), DIV_UP(cur_nnz, 16), 1),
               dim3(16, 16, 1)>>>

Expected Behavior

it must be divided by (16(BLOCK_SIZE)*4(N_LOOP)) to be correct numBlocks in Y dim

fetch_on_demand_gemm_no_fusion_fp32_1<16, 4, 8>
            <<<dim3(DIV_UP(out_channel, 16), DIV_UP(cur_nnz, 16 * N_LOOP), 1),
               dim3(16, 16, 1)>>>

Environment

- GCC:
- NVCC:
- PyTorch:
- PyTorch CUDA:
- TorchSparse:

Anything else?

We can't make a bugfix PR?