Closed sherpal closed 3 years ago
Hi @sherpal, I've also encountered this issue multiple times. The cause is that the model checkpoints are saved in multiple files, as indicated by the .data-XXX...
This is new behaviour from tensorflow > 2.1 which we did not test. A quick fix would be to downgrade to:
Indeed, downgrading (almost) worked.
There was still a catch with the h5py
package which made a breaking change in its 3.x version, and hence I hit the following issue: https://github.com/keras-team/keras/issues/14265
But
pip uninstall h5py
pip install h5py==2.10
fixed it.
Hello,
I try to launch the training for "hex" on my machine. The command I'm using is
I haven't touched anything in the configuration, so there are the ones from master.
The 50 self play iterations run successfully, then the 100 iterations of the back-propagation as well. However, after it finishes, I get the following error:
Indeed, the files I have in that folder are the following:
Here are the versions of the libs I use:
I'm running on Windows 10 with CUDA 11 and, if it matters, a GTX1070 as GPU.