vermouthdky / SimTeG

Official Repo of SimTeG
MIT License
30 stars 5 forks source link

RuntimeError: Default process group has not been initialized, please make sure to call init_process_group. #12

Open Heycen opened 1 month ago

Heycen commented 1 month ago

Hi, thank you so much for all your work. But when I want to reproduce the example, "2. Then we train a GraphSAGE on top of the generated embeddings:", here is an error.

Traceback (most recent call last): File "/home/cen/workdir/SimTeG/main.py", line 140, in main(args) File "/home/cen/workdir/SimTeG/main.py", line 116, in main test_acc, val_acc = train(args, return_value="test") File "/home/cen/workdir/SimTeG/src/run.py", line 107, in train data, split_idx, evaluator = load_data(args) File "/home/cen/workdir/SimTeG/src/run.py", line 40, in load_data data, split_idx, evaluator = load_data_bundle( File "/home/cen/workdir/SimTeG/src/dataset/init.py", line 22, in load_data_bundle dataset = load_dataset(name, root=root, tokenizer=tokenizer, tokenize=tokenize) File "/home/cen/workdir/SimTeG/src/dataset/init.py", line 18, in load_dataset return datasets[name](root=root, tokenizer=tokenizer, tokenize=tokenize) File "/home/cen/workdir/SimTeG/src/dataset/ogbn_arxiv.py", line 56, in init super(OgbnArxivWithText, self).init(name, meta_info, root, transform, pre_transform, tokenizer, tokenize) File "/home/cen/workdir/SimTeG/src/dataset/ogb_with_text.py", line 52, in init dist.barrier() File "/opt/anaconda3/envs/simteg/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 3144, in barrier default_pg = _get_default_group() File "/opt/anaconda3/envs/simteg/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 584, in _get_default_group raise RuntimeError( RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.

ECMGit commented 1 month ago

same issue here, you can comment off dist.barrier() in "/home/cen/workdir/SimTeG/src/dataset/ogb_with_text.py", line 52

but when I do this, the trainer only use cpu to train