plkmo / AlphaZero_Connect4

PyTorch implementation of AlphaZero Connect from scratch (with results)
Apache License 2.0
82 stars 39 forks source link

Error during the first run #5

Open yaniferhaoui opened 4 years ago

yaniferhaoui commented 4 years ago

Hello, I have a trouble during the first run:

python3 main_pipeline.py

I get :

"alpha_net_c4.py", line 14, in __init__ self.X = dataset[:,0] IndexError: too many indices for array

Someone can help me ? Thanks for your time

plkmo commented 4 years ago

can you post the full error traceback please? Meanwhile, try updating numpy and see if the problem persists.

daisukeadachi commented 4 years ago

Hi, I also encountered the same issue as @yaniferhaoui had. I tried updating numpy to version 1.19.1 but the issue still persists. Here is my full error traceback.

Traceback (most recent call last):
  File "main_pipeline.py", line 36, in <module>
    train_connectnet(args, iteration=i, new_optim_state=True)
  File "/Users/daisukeadachi/Documents/GitHub/AlphaZero_Connect4/src/train_c4.py", line 153, in train_connectnet
    train(net, datasets, optimizer, scheduler, start_epoch, 0, args, iteration)
  File "/Users/daisukeadachi/Documents/GitHub/AlphaZero_Connect4/src/train_c4.py", line 67, in train
    train_set = board_data(dataset)
  File "/Users/daisukeadachi/Documents/GitHub/AlphaZero_Connect4/src/alpha_net_c4.py", line 14, in __init__
    self.X = dataset[:,0]
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

Thank you.

nmuh0002 commented 3 years ago

Hi, may I know whether you have the solution to this error. I am also getting the same error.

Thanks

sha256feng commented 3 years ago

I printed the dataset shape, it is (0,). This is an empty dataset. Note that the code in the main pipeline is as follows:

run_MCTS(args, start_idx=0, iteration=i)
train_connectnet(args, iteration=i, new_optim_state=True)

the error in train_connectnet() originates from the empty results of run_MCTS(). Apparently, the MCTS is not working.

sha256feng commented 3 years ago

I printed the dataset shape, it is (0,). This is an empty dataset. Note that the code in the main pipeline is as follows:

run_MCTS(args, start_idx=0, iteration=i)
train_connectnet(args, iteration=i, new_optim_state=True)

the error in train_connectnet() originates from the empty results of run_MCTS(). Apparently, the MCTS is not working.

The above phenomenon happened because I actually had a mismatch problem of NVIDIA NVML:

THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/THC/THCGeneral.cpp line=50 error=804 : forward compatibility was attempted on non supported HW

Solved it by rebooting. The code now works like a magic.

Solving the torch cuda error804 by rebooting: https://github.com/pytorch/pytorch/issues/40671 https://stackoverflow.com/questions/43022843/nvidia-nvml-driver-library-version-mismatch/45319156#45319156