Hello, is your code running successfully? Can I see your configuration

TiAmo888 commented 6 months ago

          Has anyone successfully set up their environment for the code to run? I am repeatedly getting an error saying CUDNN_STATUS_EXECUTION_FAILED with an Internal error which "Failed to call ThenRnnForward with model config".

I am running the algorithm in a Jupyter Notebook on my NVIDIA GeForce RTX 3050 Ti GPU using: Python=3.6.13, TensorFlow=1.15.0, keras=2.3.1, cudatoolkit=10.0.130, cuDNN library=7.6.1. I appreciate any advice you can provide!

Originally posted by @jacobmorgan2023 in https://github.com/zhry10/PhyLSTM/issues/2#issuecomment-1839684579

jacobmorgan2023 commented 6 months ago

My code is running successfully! I ended up running the code on a different machine running Windows 10 with a NVIDIA GeForce GTX 1650 GPU. Since the version of TensorFlow used to deploy the model is outdated, I set up a virtual environment in Anaconda with the attached YAML file (in a ZIP since GitHub doesn't allow YAML) I used. lstm.zip

Once I created the environment, I used conda-forge for the following installations:

conda install scikit-learn conda install matplotlib conda install tensorflow-gpu==1.15 conda install pillow lxml jupyter opencv conda install pathlib conda install keras==2.3.1 conda install tensorflow-estimator=1.15

Double check to see if your TensorFlow dependencies are 1.15 because I had to manually install TensorFlow-Estimator at the end as somehow it was not the correct version.

I used the LSTM, PhyCNN, and PhyLSTM algorithms for my thesis. I am still preparing my manuscript and my GitHub repo for publication, but I expect to be done and have it available in the coming weeks. If you have further questions feel free to reach out!

TiAmo888 commented 6 months ago

Hello, I have configured it according to your environment. What is the reason for the following error

tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found. (0) Unknown: Fail to find the dnn implementation. [[node sequential_1/cu_dnnlstm_1/CudnnRNN (defined at C:\Users\Administrator.conda\envs\tensorflow1.15.0\lib\site-packages\tensorflow_core\python\framework\ops.py:1748) ]] [[gradients_1/sequential_1/cu_dnnlstm_2/CudnnRNN_grad/CudnnRNNBackprop/_134]]

jacobmorgan2023 commented 6 months ago

I’m pretty sure your error has to do with the installation or recognition of NVIDIA cuDNN and/or CUDA Toolkit in your virtual environment. Are they appearing as installed in your virtual environment? If not, you can find both through NVIDIA. I installed these locally, the cuDNN via: https://developer.nvidia.com/rdp/cudnn-archive and the CUDA Toolkit v10.0.130 via: https://developer.nvidia.com/cuda-10.0-download-archive?target_os=Windows&target_arch=x86_64&target_version=10&target_type=exelocal. For the cuDNN I used version 7.65 for CUDA 10.0 (November 15, 2019). If these are already installed, you may need to define your CUDA_PATH environment variable, see https://docs.nvidia.com/gameworks/content/developertools/desktop/environment_variables.htm.

It also may have to do with the drivers of your GPU. I purchased the machine I used for my experiments back in 2019, so if your machine is a lot newer it may not be capable of running versions that old. If you switch the cuDNNLSTM to LSTM in the code, I think it will still train but just take a while since it will be using CPU. The PhyLSTM took me several hours to train even with a GPU.

TiAmo888 commented 6 months ago

Thank you for your version configuration of cudnn I successfully run, thanks again for your answer.But I see a couple of undefined parameters in the code can you tell me what those parameters mean？ Snipaste_2024-04-29_10-08-05 Snipaste_2024-04-29_10-08-24

zhry10 / PhyLSTM

Hello, is your code running successfully? Can I see your configuration #3

Hello, I have configured it according to your environment. What is the reason for the following error