First of all, thanks for the enlightening work on PRIMAL.
I cloned the code and attempted to train a new model with .py file transformed by .ipynb. Model inference works fine, so I proceed to attempting to train my own model.This is followed by installation of all dependencies and compilation of cpp_mstar. The code does work but it will be suspended with GPU and memory occupied but CPU not occupied. The training program didn't report any error even exception. This problem happens almost every training after a random number of episodes.
Hi Guillaume,
First of all, thanks for the enlightening work on PRIMAL.
I cloned the code and attempted to train a new model with .py file transformed by .ipynb. Model inference works fine, so I proceed to attempting to train my own model.This is followed by installation of all dependencies and compilation of cpp_mstar. The code does work but it will be suspended with GPU and memory occupied but CPU not occupied. The training program didn't report any error even exception. This problem happens almost every training after a random number of episodes.
What I have modified is:
I have already created a conda environment for PRIMAL with cuda=10.0, cudnn=7.6.5, tensorflow-gpu=1.14 .
It would be great if you can provide some assistance to tackle the issues.
Best Wishes, Hongjun