Unable to install with CUDA

sachit-menon commented 2 years ago

Hi, I'm excited to use this package but unfortunately am having issues getting it working with CUDA. I am using a conda env and have followed the steps in the README related to that. My torch version is 1.11.0 and my cudatoolkit version is 11.3.1. My Python version is 3.8.13 on a Linux machine if that is relevant.

From a fresh environment:

conda install -c pytorch pytorch torchvision cudatoolkit=11.3
pip install torchsort

then if I try to use torchsort on a CUDA tensor, I get ImportError: You are trying to use the torchsort CUDA extension, but it looks like it is not available. Make sure you have the CUDA toolchain installed, and reinstall torchsort withpip install --force-reinstall --no-cache-dir torchsortto rebuild the extension. (which I have tried a few times now).

Any help in getting this working would be amazing! Thanks so much!

teddykoker commented 2 years ago

Hi @sachit-menon, I assume it is working alright on non-CUDA tensors?

In addition, are you getting any error messages when you install? If so could you share them?

In setup.py, it checks for the availability nvcc to determine whether to build the CUDA extension. Installing cudatoolkit like you have should do this (I think), but could you double check that you have the nvcc command available?

sachit-menon commented 2 years ago

Hi @teddykoker, thanks for the prompt reply -- I suspect that's the issue! I have cudatoolkit installed but nvcc yields command not found -- my understanding (with help from a bit of Googling) is that cudatoolkit is only for runtime use and doesn't expose developer commands like nvcc. Could it be possible to check whether to build the CUDA extension another way? (The nvidia-smi command works for me, as a side note.)

I'm not getting any error messages at installation and CPU seems to work.

teddykoker commented 2 years ago

Ah that makes sense! nvcc (the CUDA compiler) is necessary for compiling the CUDA extension. It looks like there are a number of ways to install nvcc using conda (not sure which one is recommended, cudatoolkit-dev looks promising).

sachit-menon commented 2 years ago

Progress! So after seeing that, I still ran into a few other issues, but I think (fingers crossed) I've worked them all out now. I'll describe them here with a summary at the end so you can add to the README since I think most/all conda users would face this now (unlike when the previous conda instructions were made).

So after you noted nvcc was required, I installed cudatoolkit-dev as you suggested. This resulted in a long and complicated error message, but at least it started trying to build with CUDA.

Within this mess it seemed to me that I had to install a particular version of gcc, instead of what conda now installs by default (I think when the previous README edit was made the default was suitable). So I instead installed version 9.4.0 which is less than 10 (the requirement in the error message).

Then when reinstalling, I found that the pip command given in #23 #29 #33 #44 results in a new build error, from a CUDA version mismatch. This comes from the fact that torch is reinstalled in that case with a build that may not match the user's CUDA version/the toolkit version. I found that I instead had to install torch first (as is typical), then modify the command to not re-install dependencies (which created a CUDA conflict loop).

So in summary, the modified steps are:

Install g++ with conda install -c conda-forge gxx_linux-64=9.4.0
Set export variable export CXX=/path/to/miniconda3/envs/env_name/bin/x86_64-conda_cos6-linux-gnu-g++
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/miniconda3/lib
pip install --force-reinstall --no-cache-dir --no-deps torchsort

This results in output matching what's described in the README, with a gradient for both functions.

x = torch.tensor([[8., 0., 5., 3., 2., 1., 6., 7., 9.]], requires_grad=True).cuda()
y = torchsort.soft_sort(x)

torch.autograd.grad(y[0, 0], x)
# (tensor([[0.1111, 0.1111, 0.1111, 0.1111, 0.1111, 0.1111, 0.1111, 0.1111, 0.1111]],
#         device='cuda:0'),)

y2 = torchsort.soft_rank(x)
torch.autograd.grad(y2[0, 0], x)
# (tensor([[ 0.8000,  0.0000, -0.2000,  0.0000,  0.0000,  0.0000, -0.2000, -0.2000,
         -0.2000]], device='cuda:0'),)

(As a side note, I know this is the correct output but I was wondering if you have any straightforward intuition for why these are the gradients we get? Especially the ranking one is a bit confusing to me. Maybe not suitable for GitHub issues but figured I'd ask while I'm staring at it!)

teddykoker commented 2 years ago

Thank you for trying all of this! I updated the instructions in the readme to reflect your steps.

In terms of why these are the gradients we are getting, I think it is best explained with a simple example:

Lets say we have an array x:

x = torch.tensor([[5., 4., 3., 2., 1.]], requires_grad=True)

x would currently have a ranking of [[5., 4., 3., 2., 1.]]

Now lets say we wanted to minimize the ranking of the 0th element of the array, in this case the 5. What would you have to do to all of the numbers in x to minimize the ranking of the 5? Obviously there are two options:

Decrease the 5
Increase the rest of the numbers

So, it makes sense that the gradients with respect to the ranking of the 0th element would be positive for the 0th element, and negative for all other elements. We can see this in practice by performing gradient descent:

import torch
import torchsort
x = torch.tensor([[5., 4., 3., 2., 1.]], requires_grad=True)

for step in range(10):
    y = torchsort.soft_rank(x, regularization_strength=1)
    loss = y[0,0] # we want to minimize the ranking of the 0th element (and 0th batch)
    loss.backward()
    print("x:", x)
    print("x.grad:", x.grad)
    print("y:", y)
    print()
    with torch.no_grad():
        x -= x.grad
        x.grad.zero_()

x: tensor([[5., 4., 3., 2., 1.]], requires_grad=True)
x.grad: tensor([[ 0.8000, -0.2000, -0.2000, -0.2000, -0.2000]])
y: tensor([[5., 4., 3., 2., 1.]], grad_fn=<SoftRankBackward>)

x: tensor([[4.2000, 4.2000, 3.2000, 2.2000, 1.2000]], requires_grad=True)
x.grad: tensor([[ 0.8000, -0.2000, -0.2000, -0.2000, -0.2000]])
y: tensor([[4.2000, 4.2000, 3.2000, 2.2000, 1.2000]], grad_fn=<SoftRankBackward>)

x: tensor([[3.4000, 4.4000, 3.4000, 2.4000, 1.4000]], requires_grad=True)
x.grad: tensor([[ 0.8000, -0.2000, -0.2000, -0.2000, -0.2000]])
y: tensor([[3.4000, 4.4000, 3.4000, 2.4000, 1.4000]], grad_fn=<SoftRankBackward>)

x: tensor([[2.6000, 4.6000, 3.6000, 2.6000, 1.6000]], requires_grad=True)
x.grad: tensor([[ 0.8000, -0.2000, -0.2000, -0.2000, -0.2000]])
y: tensor([[2.6000, 4.6000, 3.6000, 2.6000, 1.6000]], grad_fn=<SoftRankBackward>)

x: tensor([[1.8000, 4.8000, 3.8000, 2.8000, 1.8000]], requires_grad=True)
x.grad: tensor([[ 0.8000, -0.2000, -0.2000, -0.2000, -0.2000]])
y: tensor([[1.8000, 4.8000, 3.8000, 2.8000, 1.8000]], grad_fn=<SoftRankBackward>)

x: tensor([[1.0000, 5.0000, 4.0000, 3.0000, 2.0000]], requires_grad=True)
x.grad: tensor([[ 0.8000, -0.2000, -0.2000, -0.2000, -0.2000]])
y: tensor([[1.0000, 5.0000, 4.0000, 3.0000, 2.0000]], grad_fn=<SoftRankBackward>)

x: tensor([[0.2000, 5.2000, 4.2000, 3.2000, 2.2000]], requires_grad=True)
x.grad: tensor([[0., 0., 0., 0., 0.]])
y: tensor([[1.0000, 5.0000, 4.0000, 3.0000, 2.0000]], grad_fn=<SoftRankBackward>)

x: tensor([[0.2000, 5.2000, 4.2000, 3.2000, 2.2000]], requires_grad=True)
x.grad: tensor([[0., 0., 0., 0., 0.]])
y: tensor([[1.0000, 5.0000, 4.0000, 3.0000, 2.0000]], grad_fn=<SoftRankBackward>)

x: tensor([[0.2000, 5.2000, 4.2000, 3.2000, 2.2000]], requires_grad=True)
x.grad: tensor([[0., 0., 0., 0., 0.]])
y: tensor([[1.0000, 5.0000, 4.0000, 3.0000, 2.0000]], grad_fn=<SoftRankBackward>)

x: tensor([[0.2000, 5.2000, 4.2000, 3.2000, 2.2000]], requires_grad=True)
x.grad: tensor([[0., 0., 0., 0., 0.]])
y: tensor([[1.0000, 5.0000, 4.0000, 3.0000, 2.0000]], grad_fn=<SoftRankBackward>)

You'll see that with each step, there will be a positive gradient on the 0th element, and negative on the rest until the 0th element reaches a value less than the rest of the elements, giving it the lowest rank.

Hopefully this makes sense! Happy to answer any other questions (though just to be clear I am not the author of the original paper)

sachit-menon commented 2 years ago

Hi, just wanted to say thanks so much for the great explanation, that makes a lot of sense now! I'm working to see if this fits my use case and am sure more questions will pop up as I'm doing that, appreciate your hard work on the library!

teddykoker commented 2 years ago

Great! Always happy to help :)

teddykoker / torchsort

Unable to install with CUDA #52