microsoft / CNTK

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
https://docs.microsoft.com/cognitive-toolkit/
Other
17.5k stars 4.29k forks source link

CUBLAS failure 11: CUBLAS_STATUS_MAPPING_ERROR when fitting a simple model with keras #2621

Open amal opened 6 years ago

amal commented 6 years ago

Nvidia GeForce GTX 1080 Core i7-7700K

Simple 3 layers Dense network with selu activation.

Epoch 58/10000
CUBLAS failure 11: CUBLAS_STATUS_MAPPING_ERROR ; GPU=0 ; hostname=amal-laptop ; expr=cublasGetMatrix((int) numRows, (int) numCols, sizeof(ElemType), Data(), (int) GetNumRows(), dst, (int) colStride)
Traceback (most recent call last):
  File "/win/d/Apps/ML/ftech_experiment_1.py", line 394, in <module>
    epochs=EPOCHS, batch_size=BATCH_SIZE)
  File "/home/amal/venv/cntk/lib/python3.5/site-packages/keras/models.py", line 893, in fit
    initial_epoch=initial_epoch)
  File "/home/amal/venv/cntk/lib/python3.5/site-packages/keras/engine/training.py", line 1631, in fit
    validation_steps=validation_steps)
  File "/home/amal/venv/cntk/lib/python3.5/site-packages/keras/engine/training.py", line 1213, in _fit_loop
    outs = f(ins_batch)
  File "/home/amal/venv/cntk/lib/python3.5/site-packages/keras/backend/cntk_backend.py", line 1815, in __call__
    input_dict, self.trainer_output)
  File "/home/amal/venv/cntk/lib/python3.5/site-packages/cntk/train/trainer.py", line 163, in train_minibatch
    output_map[k] = _value_as_sequence_or_array(v, k)
  File "/home/amal/venv/cntk/lib/python3.5/site-packages/cntk/internal/__init__.py", line 25, in _value_as_sequence_or_array
    return val.asarray()
  File "/home/amal/venv/cntk/lib/python3.5/site-packages/cntk/tensor.py", line 254, in asarray
    result = ndav.to_ndarray()
  File "/home/amal/venv/cntk/lib/python3.5/site-packages/cntk/cntk_py.py", line 1060, in to_ndarray
    return _cntk_py.NDArrayView_to_ndarray(self)
RuntimeError: CUBLAS failure 11: CUBLAS_STATUS_MAPPING_ERROR ; GPU=0 ; hostname=amal-laptop ; expr=cublasGetMatrix((int) numRows, (int) numCols, sizeof(ElemType), Data(), (int) GetNumRows(), dst, (int) colStride)

[CALL STACK]
[0x7fb0b5051c2c]                                                       + 0x532c2c
[0x7fb0ad2add43]                                                       + 0xabcd43
[0x7fb0ad272d55]    Microsoft::MSR::CNTK::Matrix<float>::  AssignValuesOf  (Microsoft::MSR::CNTK::Matrix<float> const&) + 0x1b5
[0x7fb0b5287bb5]    CNTK::NDArrayView::  CopyFrom  (CNTK::NDArrayView const&) + 0x1b5
[0x7fb0b5c82c4b]    NDArrayViewToNumPy  (CNTK::NDArrayView const*)     + 0x13b
[0x7fb0b5c8593e]                                                       + 0x1ea93e
[0x4e9b9f]          PyCFunction_Call                                   + 0x4f
[0x524414]          PyEval_EvalFrameEx                                 + 0x614
[0x528814]          PyEval_EvalFrameEx                                 + 0x4a14
[0x52d82f]                                                            
[0x529332]          PyEval_EvalFrameEx                                 + 0x5532
[0x528814]          PyEval_EvalFrameEx                                 + 0x4a14
[0x52d82f]                                                            
[0x528eee]          PyEval_EvalFrameEx                                 + 0x50ee
[0x52e12b]          PyEval_EvalCodeEx                                  + 0x13b
[0x4ebcc3]                                                            
[0x5b7167]          PyObject_Call                                      + 0x47
[0x4f413e]                                                            
[0x5b7167]          PyObject_Call                                      + 0x47
[0x54d4f6]                                                            
[0x5b7167]          PyObject_Call                                      + 0x47
[0x528d06]          PyEval_EvalFrameEx                                 + 0x4f06
[0x52d2e3]                                                            
[0x528eee]          PyEval_EvalFrameEx                                 + 0x50ee
[0x52d2e3]                                                            
[0x528eee]          PyEval_EvalFrameEx                                 + 0x50ee
[0x52d2e3]                                                            
[0x528eee]          PyEval_EvalFrameEx                                 + 0x50ee
[0x52d2e3]                                                            
[0x52dfdf]          PyEval_EvalCode                                    + 0x1f
[0x5fd2c2]                                                            
[0x5ff76a]          PyRun_FileExFlags                                  + 0x9a
[0x5ff95c]          PyRun_SimpleFileExFlags                            + 0x1bc
[0x63e7d6]          Py_Main                                            + 0x456
[0x4cfe41]          main                                               + 0xe1
[0x7fb0d2379830]    __libc_start_main                                  + 0xf0
[0x5d5f29]          _start                                             + 0x29

Process finished with exit code 1
rajat95 commented 6 years ago

Even just exporting vgg16 from Keras.applications asmodel = vgg16.VGG16(weights='imagenet', include_top=False) leads to above mentioned error.

rajat95 commented 6 years ago

Strangely the problem for me with vgg was resolved by simply running program with sudo.

hendrikschafer commented 5 years ago

check that all your nvidia drivers are up to date (update through nvidia geforce experince), since I updated mine I haven't had any issues!