usnistgov / hiperc

High Performance Computing Strategies for Boundary Value Problems
https://pages.nist.gov/hiperc/en/latest/index.html
39 stars 7 forks source link

dynamic tile size #113

Closed tkphd closed 6 years ago

tkphd commented 6 years ago

Shared tiles on the GPU are statically allocated, e.g.

__shared__ fp_t d_conc_tile[TILE_H + MAX_MASK_H - 1][TILE_W + MAX_MASK_W - 1];

This is unnecessary, since CUDA and OpenCL natively support dynamic shared memory allocation, e.g.

extern __shared__ fp_t d_conc_tile[];

The dynamic array size is specified in the kernel execution configuration, e.g.

const size_t smem_size = (tileWidth + maskWidth) * (tileHeight + maskHeight);
convolution_kernel<<<blocks,threads,smem_size>>>(...);

Restore bs parameter in params.txt and eliminate globally defined constants TILE_H and TILE_W.

Refs: