Closed godmoves closed 6 years ago
I also meet this problem
~/elf/scripts/elfgames/go$ ./game.sh ./gtp.sh: line 18: 6651 Segmentation fault (core dumped) game=elfgames.go.game model=df_pred model_file=elfgames.go.df_model3 python3 df_console.py --mode online --keys_in_reply V rv --use_mcts --mcts_verbose_time --mcts_use_prior --mcts_persistent_tree --load $MODEL --server_addr localhost --port 1234 --replace_prefix resnet.module,resnet --no_check_loaded_options --no_parameter_print --leaky_relu "$@"
Seems the error comes from the elf module in src_py/elf
, @pangafu
@godmoves how to resolve? My machine is also Ubuntu 17.10
all goto core dumped...
./start_selfplay.sh ./start_selfplay.sh: line 57: 13961 Segmentation fault (core dumped) game=elfgames.go.game model=df_pred model_file=elfgames.go.df_model3 python3 selfplay.py --mode selfplay --selfplay_timeout_usec 10 --batchsize $BATCHSIZE --mcts_rollout_per_batch $BATCHSIZE --num_games 1 --keys_in_reply V rv --port 2341 --server_id myserver --mcts_threads 2 --mcts_rollout_per_thread $NUM_ROLLOUTS --use_mcts --use_mcts_ai2 --mcts_use_prior --mcts_persistent_tree --mcts_puct 1.5 --batchsize2 $BATCHSIZE2 --white_mcts_rollout_per_batch $BATCHSIZE2 --white_mcts_rollout_per_thread $NUM_ROLLOUTS2 --eval_model_pair loaded --policy_distri_cutoff 0 --mcts_virtual_loss 1 --mcts_epsilon 0.0 --mcts_alpha 0.00 --resign_thres 0.05 --num_block0 $NUM_BLOCK --dim0 $DIM --num_block1 $NUM_BLOCK --dim1 $DIM --no_check_loaded_options0 --no_check_loaded_options1 --verbose --gpu $GPU --load0 $LOAD0 --load1 $LOAD1 --use_fp160 --use_fp161 --gpu $GPU --replace_prefix0 resnet.module,resnet --replace_prefix1 resnet.module,resnet "$@"
I don't know how to solve this issue, but it seems an error with Ubuntu 17.10 only. Both 16.04 and 18.04 work fine.
UPDATE:conda install -c pytorch pytorch-nightly
using the nightly built version solves this problem.
if you are using volta GPU, use conda install -c pytorch pytorch-nightly cuda90
(updated readme now)
I compiled PyTorch and ELF go on a Ubuntu 17.10 machine, and when I am trying to launch the program I got this error:
I test PyTorch, and it works fine. So any idea where the error comes from?