Open CoinCheung opened 4 years ago
I'm pretty sure you need to specify a size for the shared array sdata
Here if I understood correctly it's 2* 512 ?
You should probably define these numbers at the beginning of your file so that it's et at compile time, but you still have some flexibility. See an example here : https://github.com/ClementPinard/Pytorch-Correlation-extension/blob/master/Correlation_Module/correlation_cuda_kernel.cu#L47
@ClementPinard Thanks for replying !! As far as I know, cuda seems support dynamic allocated shared memory within a block, which is defined like extern __shared__ scalar_t sdata[]
. Doesn't pytorch support that ? What if the size of shared memory is not known in the compiling time, do we have an option to assign the size dynamically?
Ah actually you are right, you can use dynamic shared arrays. Apparently, you need to specify the shared object size in another option in the kernel call. See here : https://devblogs.nvidia.com/using-shared-memory-cuda-cc/
But I did have called the kernel function with shared memory sized assgined, I called it like this:
TestForward<scalar_t><<<grid, block, 4096, at::cuda::getCurrentCUDAStream()>>>();
I assigned 4k shared memory for each block in this way. Would you please tell me why does this not work?
Ok sorry about misleading you, your code is mostly fine. I tried your code, and the problem seems to come from the template and the fact you use three different specializations of the template (float double and half), because doesn't allow two differently typed dynamic shared array with the same name ¯\_(ツ)_/¯
See here for more info : https://stackoverflow.com/questions/27570552/templated-cuda-kernel-with-dynamic-shared-memory
in the end you need to change the line
extern __shared__ scalar_t sdata[];
with the two lines
extern __shared__ __align__(sizeof(scalar_t)) unsigned char sdata_uchar[];
scalar_t *sdata = reinterpret_cast<scalar_t *>(sdata_uchar);
Thanks !!! It works now, but I have two more warnings:
/miniconda/envs/py36/lib/python3.6/site-packages/torch/include/torch/csrc/autograd/profiler.h(97): warning: attribute "__visibility__" does not apply here
warning: specified alignment (4) is different from alignment (8) specified on a previous declaration
detected during instantiation of "void compute_numer_denor(int, const scalar_t *, const int64_t *, scalar_t *, scalar_t *, float, float) [with scalar_t=float]"
Will this be fine if my code go with these two warnings?
I am working on ubuntu16.04 with pytorch1.3 installed from conda. My cuda version is 10.1.243 and cudnn version is 7. I have 8 t4 gpus on my server and the gcc version is the default 5.4. When submitting a bug report, please include the following information (where relevant):
The simplified version of my code
main.cu
is like this:and the
setup.py
is like this:I compiled it with command
python setup.py install
. And the error message is like this:What is the cause of this and how could I cope with this problem please ?