Fail to train policy network

I am training the model without any modification in your code except uncomment some code to use pytorch. However, I got an error after running after batch i:5. Below are the errors I get.

C:\Users\xxx\Desktop\AlphaZero_Gomoku\policy_value_net_pytorch.py:51: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument. x_act = F.log_softmax(self.act_fc1(x_act)) C:\Users\xxx\Miniconda3\lib\site-packages\torch\nn\functional.py:1320: UserWarning: nn.functional.tanh is deprecated. Use torch.tanh instead. warnings.warn("nn.functional.tanh is deprecated. Use torch.tanh instead.") batch i:1, episode_len:11 batch i:2, episode_len:17 batch i:3, episode_len:14 batch i:4, episode_len:19 batch i:5, episode_len:13 Traceback (most recent call last): File "train.py", line 195, in training_pipeline.run() File "train.py", line 173, in run loss, entropy = self.policy_update() File "train.py", line 108, in policy_update self.learn_rate*self.lr_multiplier) File "C:\Users\xxx\Desktop\AlphaZero_Gomoku\policy_value_net_pytorch.py", line 148, in train_step return loss.data[0], entropy.data[0] IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

How can I fix the problem? Should I use tensor.item function to fix it?

junxiaosong / AlphaZero_Gomoku

Fail to train policy network #82