jiangzhongshi / SurfaceNetworks

Source code for CVPR 2018 Oral paper "Surface Networks"
91 stars 20 forks source link

cupy.cuda.driver.CUDADriverError: CUDA_ERROR_NOT_INITIALIZED: initialization error #2

Open finerc opened 5 years ago

finerc commented 5 years ago

Thank you for the great code! I have a problem. When i run the program on gpu, the output is as follows:

Load data Preprocess Dataset 100% (60000 of 60000) |####################| Elapsed Time: 0:00:20 Time: 0:00:20 100% (10000 of 10000) |####################| Elapsed Time: 0:00:03 Time: 0:00:03 Num parameters 90314 N/A% (0 of 937) | | Elapsed Time: 0:00:00 ETA: --:--:--Traceback (most recent call last): File "main.py", line 213, in <module> main() File "main.py", line 155, in main outputs = model(inputs, laplacian, mask) File "/home/jiang/work/ping/local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in __call__ result = self.forward(*input, **kwargs) File "/home/jiang/work/ping/SurfaceNetworks/src/mesh_mnist/models.py", line 43, in forward x = self._modules['rn{}'.format(i)](L, mask, x) File "/home/jiang/work/ping/local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in __call__ result = self.forward(*input, **kwargs) File "/home/jiang/work/ping/SurfaceNetworks/src/utils/utils_pt.py", line 125, in forward xs = [x, SparseBMMFunc()(L, x)] File "/home/jiang/work/ping/SurfaceNetworks/src/utils/cuda/sparse_bmm_func.py", line 39, in forward col_ind, col_ptr = batch_csr(matrix1._indices(), matrix1.size()) File "/home/jiang/work/ping/SurfaceNetworks/src/utils/cuda/batch_csr.py", line 39, in __call__ m.load(bytes(ptx.encode())) File "cupy/cuda/function.pyx", line 175, in cupy.cuda.function.Module.load File "cupy/cuda/function.pyx", line 176, in cupy.cuda.function.Module.load File "cupy/cuda/driver.pyx", line 141, in cupy.cuda.driver.moduleLoadData File "cupy/cuda/driver.pyx", line 72, in cupy.cuda.driver.check_status cupy.cuda.driver.CUDADriverError: CUDA_ERROR_NOT_INITIALIZED: initialization error 100% (937 of 937) |########################| Elapsed Time: 0:00:00 Time: 0:00:00

Is there any problem with my operation?

System information

jiangzhongshi commented 5 years ago

Hi,

Unfortunately, I am no CUDA expert. Can you try the dev branch that doesn't use cupy?

jiangzhongshi commented 4 years ago

Hi,

I believe in the dev branch, this function is no longer useful. Can you tell me which file is triggering this?

On Wed, Dec 4, 2019 at 10:40 PM SimonPig notifications@github.com wrote:

Hi,

Unfortunately, I am no CUDA expert. Can you try the dev branch that doesn't use cupy?

hi, i have a similar issue that when i run utils_pt.py, i cant find 'SparceBMMFunc()', is that one of your work? thank u

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_jiangzhongshi_SurfaceNetworks_issues_2-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DADBLCFYOWO4Z53UR3OU3PPTQXBZZHA5CNFSM4HFXV2X2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF7LZYY-23issuecomment-2D561954019&d=DwMCaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=2v3jiwtgToNyGgWlUiot8g&m=P47aIs9HK7w9u3haGvXUxTFHzWLdN3ZW9VgC5uSj-Yo&s=qpg9m6PJN_GbUMuX9xz8AihZb1VfqBhEda3tggMDnyk&e=, or unsubscribe https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ADBLCF7M2CRNQW2MHLBBEH3QXBZZHANCNFSM4HFXV2XQ&d=DwMCaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=2v3jiwtgToNyGgWlUiot8g&m=P47aIs9HK7w9u3haGvXUxTFHzWLdN3ZW9VgC5uSj-Yo&s=FLXAv4wD_ijtz3S_HdJMWBYir-SE0P3ThI9WsBvv-2g&e= .

SimonPig commented 4 years ago

that error has gone with refresh the codes, but now i encountered another: from main() in minst_mesh task, line 109: laplacian=utils.sparse_cat(laplacian,sample_batch.num_vertices,sample_batch_num_vertices) it stepped into utils_pt.sparse_cat where i saw sparse_cat() got an list of tensors with layout = sparse.coo, and thereon, for (...) value.append(tensor._values()), the interpreter said it failed to find a dispatch key 'CPUTensorId' for operator _values(), following this message, i saw into the for loop, and found the tensor is actually a tensor with layout = tensor.strided which indicates a dense tensor... is that a issue or something wrong elsewhere? thank you