Open Totemi1324 opened 4 years ago
I haven't tried building with CUDA11 yet.
Maybe the error can be fixed with the following changes. Also probably this function(cutorch.isManaged
, cutorch.toCudaUVATensor
and cutorch.toFloatUVATensor
) is not called from any program.
diff --git a/init.c b/init.c
index 8b32a1a..a2307bb 100644
--- a/init.c
+++ b/init.c
@@ -935,7 +935,7 @@ static int cutorch_isManagedPtr(lua_State *L)
lua_pushboolean(L, 0);
} else {
THCudaCheck(res);
- lua_pushboolean(L, attributes.isManaged);
+ lua_pushboolean(L, attributes.type == cudaMemoryTypeManaged);
}
return 1;
}
Hello, I tried your solution, and it seems to make it work, however, a new error showed up. I suppose it doesn't have to do with the fix you provided, but it would have been there otherwise.
/home/tamas/torch/extra/cunn/lib/THCUNN/generic/SparseLinear.cu(95): error: identifier "cusparseScsrmm" is undefined
/home/tamas/torch/extra/cunn/lib/THCUNN/generic/SparseLinear.cu(194): error: identifier "cusparseScsrmm" is undefined
/home/tamas/torch/extra/cunn/lib/THCUNN/generic/SparseLinear.cu(97): error: identifier "cusparseDcsrmm" is undefined
/home/tamas/torch/extra/cunn/lib/THCUNN/generic/SparseLinear.cu(196): error: identifier "cusparseDcsrmm" is undefined
4 errors detected in the compilation of "/home/tamas/torch/extra/cunn/lib/THCUNN/SparseLinear.cu".
CMake Error at THCUNN_generated_SparseLinear.cu.o.cmake:267 (message):
Error generating file
/home/tamas/torch/extra/cunn/build/lib/THCUNN/CMakeFiles/THCUNN.dir//./THCUNN_generated_SparseLinear.cu.o
make[2]: *** [lib/THCUNN/CMakeFiles/THCUNN.dir/build.make:268: lib/THCUNN/CMakeFiles/THCUNN.dir/THCUNN_generated_SparseLinear.cu.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [CMakeFiles/Makefile2:111: lib/THCUNN/CMakeFiles/THCUNN.dir/all] Error 2
make: *** [Makefile:130: all] Error 2
Error: Build error: Failed building.
Any ideas of what I can do?
Nvidia deprecated those functions in 11 release, they recommend using a different one in the docs, but it has different arguments and would require messing with the matrices in that file to make them fit, does anyone know what this THCUNN lib even is?
It is linear module for sparse matrix format. I haven't used sparse matrix in torch7. So I can fix it, but I'm not confident to test it. If you're not using it, I think the easiest solution is to remove it from the library.
If you're not using it, I think the easiest solution is to remove it from the library.
How to remove? you mean remove the whole dir extra/cunn
?
I have tried just rename these 2 files:
$ mv extra/cunn/lib/THCUNN/generic/SparseLinear.cu extra/cunn/lib/THCUNN/generic/SparseLinear.cu.orig
$ mv extra/cunn/lib/THCUNN/SparseLinear.cu extra/cunn/lib/THCUNN/SparseLinear.cu.orig
Torch complains about some undefined symbol (e.g. THNN_CudaSparseLinear_updateOutput
), but otherwise seems working, as long as your code does not call any functions in these files.
How to remove? you mean remove the whole dir extra/cunn ?
No, it only removes functions related to sparse matrix where CUDA is used. However, I have not tried it. On Ubuntu 21.04, qt4 is also removed and there is no ppa package. I think it is better to use the Docker version (Ubuntu 18.04 and CUDA 10).
RTX30 series card only support CUDA11, so we cannot run torch on latest card now.
How to remove? you mean remove the whole dir extra/cunn ?
No, it only removes functions related to sparse matrix where CUDA is used. However, I have not tried it. On Ubuntu 21.04, qt4 is also removed and there is no ppa package. I think it is better to use the Docker version (Ubuntu 18.04 and CUDA 10).
Hello,
I tried to follow this approach and everything works fine.
Just comment these lines in SparserLinear.cu: Line 94
/*#ifdef THC_REAL_IS_FLOAT
cusparseScsrmm(cusparse_handle,
#elif defined(THC_REAL_IS_DOUBLE)
cusparseDcsrmm(cusparse_handle,
#endif
CUSPARSE_OPERATION_NON_TRANSPOSE,
batchnum, outDim, inDim, nnz,
&one,
descr,
THCTensor_(data)(state, values),
THCudaIntTensor_data(state, csrPtrs),
THCudaIntTensor_data(state, colInds),
THCTensor_(data)(state, weight), inDim,
&one, THCTensor_(data)(state, buffer), batchnum
);*/
Line 193
/*#ifdef THC_REAL_IS_FLOAT
cusparseScsrmm(cusparse_handle,
#elif defined(THC_REAL_IS_DOUBLE)
cusparseDcsrmm(cusparse_handle,
#endif
CUSPARSE_OPERATION_NON_TRANSPOSE,
inDim, outDim, batchnum, nnz,
&one,
descr,
THCTensor_(data)(state, values),
THCudaIntTensor_data(state, colPtrs),
THCudaIntTensor_data(state, rowInds),
THCTensor_(data)(state, buf), batchnum,
&one, THCTensor_(data)(state, gradWeight), inDim
);*/
Hello, First of all: Many thanks for making this modifications, it fixed a whole lot of my problems with installing Torch so far! In the install process though, I came across an error that is likely due to the new version of CUDA. Recently, CUDA 11 came out and I tried to build with it, the following error appears:
It seems that the used attribute is deprecated and no longer supported (see https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaPointerAttributes.html#structcudaPointerAttributes). Is there a chance you can fix this or am I forced to switch to CUDA 10.1?