Open danialvi opened 2 years ago
I got this error now:
InternalError Traceback (most recent call last) C:\ProgramData\anaconda3\envs\airsim\lib\site-packages\tensorflow\python\client\session.py in _do_call(self, fn, args) 1321 try: -> 1322 return fn(args) 1323 except errors.OpError as e:
C:\ProgramData\anaconda3\envs\airsim\lib\site-packages\tensorflow\python\client\session.py in _run_fn(feed_dict, fetch_list, target_list, options, run_metadata) 1306 return self._call_tf_sessionrun( -> 1307 options, feed_dict, fetch_list, target_list, run_metadata) 1308
C:\ProgramData\anaconda3\envs\airsim\lib\site-packages\tensorflow\python\client\session.py in _call_tf_sessionrun(self, options, feed_dict, fetch_list, target_list, run_metadata) 1408 self._session, options, feed_dict, fetch_list, target_list, -> 1409 run_metadata) 1410 else:
InternalError: Blas GEMM launch failed : a.shape=(30, 64), b.shape=(64, 10), m=30, n=10, k=64 [[Node: dense2/MatMul = MatMul[T=DT_FLOAT, _class=["loc:@training/Nadam/gradients/dropout_2/cond/Merge_grad/cond_grad"], transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](dropout_2/cond/Merge, dense2/kernel/read)]] [[Node: loss/mul/_129 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1107_loss/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
During handling of the above exception, another exception occurred:
InternalError Traceback (most recent call last)
@mitchellspryn please help
@adshar
depencies.txt Here is the list of dependencies I have in my anaconda env:
I am not at MSFT currently, so I am not actively supporting this repo any more.
That said, I took a look at your stack trace. It looks like CUDA isn't installed properly. Relevant portion:
InternalError: Blas GEMM launch failed : a.shape=(30, 64), b.shape=(64, 10), m=30, n=10, k=64
[[Node: dense2/MatMul = MatMul[T=DT_FLOAT, _class=["loc:@training/Nadam/gradients/dropout_2/cond/Merge_grad/cond_grad"], transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](dropout_2/cond/Merge, dense2/kernel/read)]]
I'd check to see if you can run any keras training operation - e.g. try training a linear model on some random data points and see if the forward/backpropagation works properly. My guess is no, and that'll help you debug what the situation is with your cuda install.
I am not at MSFT currently, so I am not actively supporting this repo any more.
That said, I took a look at your stack trace. It looks like CUDA isn't installed properly. Relevant portion:
InternalError: Blas GEMM launch failed : a.shape=(30, 64), b.shape=(64, 10), m=30, n=10, k=64 [[Node: dense2/MatMul = MatMul[T=DT_FLOAT, _class=["loc:@training/Nadam/gradients/dropout_2/cond/Merge_grad/cond_grad"], transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](dropout_2/cond/Merge, dense2/kernel/read)]]
I'd check to see if you can run any keras training operation - e.g. try training a linear model on some random data points and see if the forward/backpropagation works properly. My guess is no, and that'll help you debug what the situation is with your cuda install.
Thank you for answering. I have tried to reinstall to check if it's something to do with cuda. I also tried by installing the cudatoolkit and cudann before install tensorflow by following these steps: conda install cudatoolkit=9.0 conda install cudnn=7.1.4=cuda9.0_0 conda install -c anaconda tensorflow-gpu=1.8.0 conda install -c anaconda keras-gpu=2.1.2 python -m pip install --upgrade pip conda update -n base conda pip install msgpack-rpc-python pip uninstall tornado conda install -c conda-forge tornado=4.5.3 conda install jupyter pip install matplotlib==2.1.2 pip install image pip install keras_tqdm conda install -c conda-forge opencv conda install pandas pip install --upgrade numpy==1.16.4 conda install scipy pip install opencv-python pip install --upgrade h5py==2.10.0 python -m ipykernel install --user
Still I have the same problem. Do you have any idea how I can solve this? I have really tried to look it up, but it seems many had the same problem, but no solutions that worked for me. As I am using this as a part of my master thesis, I have limited time as well.
My training is not starting. I have used python 3.6 with tensorflow gpu 1.8.0 and keras 2.1.2. Also I have a Geforce GTX 3060 running on my computer. So it shouldnt be a problem. I also installed Norton antivirus on this new computer. On the older computer which has a bad GPU I had Panda Dome, but there training was running. But after over 1 hour, the training was only on 1%. Thats why I bought a new computer with a good GPU and CPU. Some of this work is going to be presented in my master thesis. I would appreciate any help soon.