dynamic tile size - Githubissues

Shared tiles on the GPU are statically allocated, e.g.

__shared__ fp_t d_conc_tile[TILE_H + MAX_MASK_H - 1][TILE_W + MAX_MASK_W - 1];

This is unnecessary, since CUDA and OpenCL natively support dynamic shared memory allocation, e.g.

extern __shared__ fp_t d_conc_tile[];

The dynamic array size is specified in the kernel execution configuration, e.g.

const size_t smem_size = (tileWidth + maskWidth) * (tileHeight + maskHeight);
convolution_kernel<<<blocks,threads,smem_size>>>(...);

Restore bs parameter in params.txt and eliminate globally defined constants TILE_H and TILE_W.

Refs:

usnistgov / hiperc