CUDA problems in causal linear product

xyltt commented 3 years ago

Hi, My machine has 4 gpus, but when I use the gpu-1 (where the default gpu is 0), I found the cuda code be computed on the gpu-0. And, the code can not be computed when I use multiple gpus one time. There is a out of memory error.

tianylin98 commented 3 years ago

Same issue here. When the data are put in devices other than cuda:0, the output is always zero's.

To reproduce the err:

import torch
from fast_transformers.causal_product import causal_dot_product

q = k = v = torch.randn(5, 10, 10, 10).to(0)
print(causal_dot_product(q, k, v)) # this should produce the right result.

q = k = v = torch.randn(5, 10, 10, 10).to(1)
print(causal_dot_product(q, k, v)) #the output is all zero's

katie-cathy-hunt commented 3 years ago

Hi @angeloskath! When do you plan to fix the bug?

angeloskath commented 3 years ago

@katie-cathy-hunt I will push a fix today. Sorry this took so long.

Cheers, Angelos

katie-cathy-hunt commented 3 years ago

@angeloskath Thanks for the quick response and help!

bbelgodere commented 3 years ago

@angeloskath I just rebuilt my environment to try your patch, but am running into a new issue

import torch
>>> from fast_transformers.causal_product import causal_dot_product
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/dccstor/bmbelgod1/projects/fast-transformers/fast_transformers/causal_product/__init__.py", line 9, in <module>
    from .causal_product_cpu import causal_dot_product as causal_dot_product_cpu, \
ModuleNotFoundError: No module named 'fast_transformers.causal_product.causal_product_cpu'

I can import fast_transformers, but if I try to import fast_transformers.causal_product I get the same error.

I verified I had pulled your fix

 sed -n 59,63p fast_transformers/aggregate/aggregate_cuda.cu
) {
    // Make sure that we are using the correct GPU device
    torch::DeviceGuard _guard(X.device());

    int N = X.size(0);

and it's in the environment

pip list | grep fast
pytorch-fast-transformers 0.3.0

No errors in the build/install log

angeloskath commented 3 years ago

Hmm, that is weird. What did you do to rebuild? Could I bother you to do a rm -r build and then rebuild?

(Next step should be to provide prebuilt binaries for common setups to avoid all these issues)

bbelgodere commented 3 years ago

I thought I may have induced the error myself, I am using a conda environment with cuda installed via conda, which only installs the shared libraries, not nvcc. Looked through your setup.py and it doesn't produce an error/message if it doesn't find nvcc. I then loaded the module to add cuda 11 (same version pytorch is compiled against) into my path.

I verified call(["nvcc"], stdout=DEVNULL, stderr=DEVNULL) returned 1

then I removed build and dist directories, then python setup.py install

Still no luck

python
Python 3.7.9 | packaged by conda-forge | (default, Dec  9 2020, 21:08:20)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> from fast_transformers.causal_product import causal_dot_product
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/dccstor/bmbelgod1/projects/fast-transformers/fast_transformers/causal_product/__init__.py", line 9, in <module>
    from .causal_product_cpu import causal_dot_product as causal_dot_product_cpu, \
ModuleNotFoundError: No module named 'fast_transformers.causal_product.causal_product_cpu'

This is on RHEL 8.2, Python 3.7.9, Pytorch 1.7.1

bbelgodere commented 3 years ago

@angeloskath I apologize, everything is working correctly, I started a python repl in the fast transformers directory after the install and it was looking for a local library first since there is a subdirectory called fast transformers... my mistake

idiap / fast-transformers

CUDA problems in causal linear product #58