Closed skye95git closed 2 years ago
When I evaluate a pre-trained model using the script graphormer/evaluate/evaluate.py
:
python evaluate.py \
--user-dir ../../graphormer \
--num-workers 16 \
--ddp-backend=legacy_ddp \
--dataset-name pcqm4m \
--dataset-source ogb \
--task graph_prediction \
--criterion l1_loss \
--arch graphormer_base \
--num-classes 1 \
--batch-size 64 \
--pretrained-model-name pcqm4mv1_graphormer_base \
--load-pretrained-model-output-layer \
--split valid \
--seed 1
There is no response. Is it the pre-training model download problem? Why is there no error if there is a problem with the model download?
The torch.sparse_csr_tensor
seems to be in pytorch1.11.0. But install. sh has Pytorch1.9.1 installed. Should I update PyTorch to 1.11.0?
Hi @skye95git , thanks very much for using Graphormer. We have already noticed your issue, and will have a plan to look into it. But it takes time to reproduce every corner cases reported in all issues, as well we're not full-time maintainer, so please kindly stay tuned if you could not solve this problem by yourself, and kindly stop rasing your question in other issue threads. Thanks very much for your understanding.
Hi @skye95git , thanks very much for using Graphormer. We have already noticed your issue, and will have a plan to look into it. But it takes time to reproduce every corner cases reported in all issues, as well we're not full-time maintainer, so please kindly stay tuned if you could not solve this problem by yourself, and kindly stop rasing your question in other issue threads. Thanks very much for your understanding.
Sorry, I just want to ask someone with operational experience for a solution to this problem. Because I tried a lot of things and it didn't work out.
Today I tried to run v1.0 branch examples/benchmarking-gnns
:
[ -z "${exp_name}" ] && exp_name="zinc"
[ -z "${seed}" ] && seed="1"
[ -z "${arch}" ] && arch="--ffn_dim 80 --hidden_dim 80 --num_heads 8 --dropout_rate 0.1 --n_layers 12 --peak_lr 2e-4 --edge_type multi_hop --multi_hop_max_dist 20"
[ -z "${warmup_updates}" ] && warmup_updates="40000"
[ -z "${tot_updates}" ] && tot_updates="400000"
echo -e "\n\n"
echo "=====================================ARGS======================================"
echo "arg0: $0"
echo "arch: ${arch}"
echo "seed: ${seed}"
echo "exp_name: ${exp_name}"
echo "warmup_updates: ${warmup_updates}"
echo "tot_updates: ${tot_updates}"
echo "==============================================================================="
save_path="../../exps/zinc/$exp_name-$warmup_updates-$tot_updates/$seed"
mkdir -p $save_path
CUDA_VISIBLE_DEVICES=0 \
python ../../graphormer/entry.py --num_workers 8 --seed $seed --batch_size 256 \
--dataset_name ZINC \
--gpus 1 --accelerator ddp --precision 16 \
$arch \
--check_val_every_n_epoch 10 --warmup_updates $warmup_updates --tot_updates $tot_updates \
--default_root_dir $save_path
There is an error:
Downloading https://www.dropbox.com/s/feo9qle74kg48gy/molecules.zip?dl=1
Traceback (most recent call last):
File "/home/linjiayi/anaconda3/envs/graphormer_v1/lib/python3.7/urllib/request.py", line 1350, in do_open
encode_chunked=req.has_header('Transfer-encoding'))
File "/home/linjiayi/anaconda3/envs/graphormer_v1/lib/python3.7/http/client.py", line 1281, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/home/linjiayi/anaconda3/envs/graphormer_v1/lib/python3.7/http/client.py", line 1327, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/home/linjiayi/anaconda3/envs/graphormer_v1/lib/python3.7/http/client.py", line 1276, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/home/linjiayi/anaconda3/envs/graphormer_v1/lib/python3.7/http/client.py", line 1036, in _send_output
self.send(msg)
File "/home/linjiayi/anaconda3/envs/graphormer_v1/lib/python3.7/http/client.py", line 976, in send
self.connect()
File "/home/linjiayi/anaconda3/envs/graphormer_v1/lib/python3.7/http/client.py", line 1443, in connect
super().connect()
File "/home/linjiayi/anaconda3/envs/graphormer_v1/lib/python3.7/http/client.py", line 948, in connect
(self.host,self.port), self.timeout, self.source_address)
File "/home/linjiayi/anaconda3/envs/graphormer_v1/lib/python3.7/socket.py", line 728, in create_connection
raise err
File "/home/linjiayi/anaconda3/envs/graphormer_v1/lib/python3.7/socket.py", line 716, in create_connection
sock.connect(sa)
OSError: [Errno 101] Network is unreachable
I want use this virtual environment(python 3.7, pytorch 1.7.0, cudatoolkit 10.2, torch-geometric 2.0.4) to run examples/property_prediction/zinc.sh
in the main branch. I run:
cd fairseq
pip install . --use-feature=in-tree-build
python setup.py build_ext --inplace
Then I run:
bash zinc.sh
There is an error:
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
zinc.sh: line 27: 3619596 Aborted (core dumped) CUDA_VISIBLE_DEVICES=1 fairseq-train --user-dir ../../graphormer --num-workers 16 --ddp-backend=legacy_ddp --dataset-name zinc --dataset-source pyg --task graph_prediction --criterion l1_loss --arch graphormer_slim --num-classes 1 --attention-dropout 0.1 --act-dropout 0.1 --dropout 0.0 --optimizer adam --adam-betas '(0.9, 0.999)' --adam-eps 1e-8 --clip-norm 5.0 --weight-decay 0.01 --lr-scheduler polynomial_decay --power 1 --warmup-updates 60000 --total-num-update 400000 --lr 2e-4 --end-learning-rate 1e-9 --batch-size 64 --fp16 --data-buffer-size 20 --encoder-layers 12 --encoder-embed-dim 80 --encoder-ffn-embed-dim 80 --encoder-attention-heads 8 --max-epoch 10000 --save-dir ./ckpts
It seems that pip install . --use-feature=in-tree-build python setup.py build_ext --inplace
cause the error Segmentation fault (core dumped)
.
Hi @skye95git , thanks very much for using Graphormer. We have already noticed your issue, and will have a plan to look into it. But it takes time to reproduce every corner cases reported in all issues, as well we're not full-time maintainer, so please kindly stay tuned if you could not solve this problem by yourself, and kindly stop rasing your question in other issue threads. Thanks very much for your understanding.
Sorry. I just want to provide you one more piece of information. Every time I run examples/property_prediction/zinc.sh there is no response. Then when I interrupt the task, the following message is displayed:
Traceback (most recent call last):
File "../../graphormer/entry.py", line 4, in <module>
from model import Graphormer
File "/home/linjiayi/Graphormer/graphormer/model.py", line 4, in <module>
from data import get_dataset
File "/home/linjiayi/Graphormer/graphormer/data.py", line 5, in <module>
from wrapper import MyGraphPropPredDataset, MyPygPCQM4MDataset, MyZINCDataset
File "/home/linjiayi/Graphormer/graphormer/wrapper.py", line 7, in <module>
from ogb.graphproppred import PygGraphPropPredDataset
File "/home/linjiayi/anaconda3/envs/graphormer/lib/python3.7/site-packages/ogb/graphproppred/__init__.py", line 1, in <module>
from .evaluate import Evaluator
File "/home/linjiayi/anaconda3/envs/graphormer/lib/python3.7/site-packages/ogb/graphproppred/evaluate.py", line 1, in <module>
from sklearn.metrics import roc_auc_score, average_precision_score
File "/home/linjiayi/anaconda3/envs/graphormer/lib/python3.7/site-packages/sklearn/__init__.py", line 82, in <module>
from .base import clone
File "/home/linjiayi/anaconda3/envs/graphormer/lib/python3.7/site-packages/sklearn/base.py", line 17, in <module>
from .utils import _IS_32BIT
File "/home/linjiayi/anaconda3/envs/graphormer/lib/python3.7/site-packages/sklearn/utils/__init__.py", line 25, in <module>
from . import _joblib
File "/home/linjiayi/anaconda3/envs/graphormer/lib/python3.7/site-packages/sklearn/utils/_joblib.py", line 7, in <module>
import joblib
File "/home/linjiayi/anaconda3/envs/graphormer/lib/python3.7/site-packages/joblib/__init__.py", line 113, in <module>
from .memory import Memory, MemorizedResult, register_store_backend
File "/home/linjiayi/anaconda3/envs/graphormer/lib/python3.7/site-packages/joblib/memory.py", line 32, in <module>
from ._store_backends import StoreBackendBase, FileSystemStoreBackend
File "/home/linjiayi/anaconda3/envs/graphormer/lib/python3.7/site-packages/joblib/_store_backends.py", line 15, in <module>
from .backports import concurrency_safe_rename
File "/home/linjiayi/anaconda3/envs/graphormer/lib/python3.7/site-packages/joblib/backports.py", line 7, in <module>
from distutils.version import LooseVersion
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 963, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 906, in _find_spec
File "/home/linjiayi/anaconda3/envs/graphormer/lib/python3.7/site-packages/_distutils_hack/__init__.py", line 90, in find_spec
return method()
File "/home/linjiayi/anaconda3/envs/graphormer/lib/python3.7/site-packages/_distutils_hack/__init__.py", line 101, in spec_for_distutils
mod = importlib.import_module('setuptools._distutils')
File "/home/linjiayi/anaconda3/envs/graphormer/lib/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "/home/linjiayi/anaconda3/envs/graphormer/lib/python3.7/site-packages/setuptools/__init__.py", line 16, in <module>
import setuptools.version
File "/home/linjiayi/anaconda3/envs/graphormer/lib/python3.7/site-packages/setuptools/version.py", line 1, in <module>
import pkg_resources
File "<frozen importlib._bootstrap>", line 202, in _lock_unlock_module
File "<frozen importlib._bootstrap>", line 98, in acquire
KeyboardInterrupt
I have solved it. Thanks.
pip uninstall setuptools
Hi @skye95git, I have encountered the same behavior; i.e., there is no response at the console after running bash zinc.sh
and interrupting the script produces a similar File "<frozen importlib._bootstrap>", line 107, in acquire
error message. However, running pip uninstall setuptools
did not fix the issue; rather, running bash zinc.sh
now throws terminate called after throwing an instance of 'std::bad_alloc'
(which is fixed by reinstalling setuptools
). May you please advise - how did you successfully run the ZINC training example? Thanks.
In case others have this issue, cc: @mswzeus from https://github.com/microsoft/Graphormer/issues/99. @zhengsx, I'd be happy to open a new issue or use Discussions if you'd prefer.
In case others have this issue, cc: @mswzeus from #99. @zhengsx, I'd be happy to open a new issue or use Discussions if you'd prefer.
We notice that similar problems occur recently, and a guess is that some recent upgrades of library lead to this problem. We're working on this problem, and maybe you can try to modify the version of cuda/torch, cu110 is recommeded.
Thanks @zhengsx, will try to downgrade PyTorch and will report back if it solves the problem. Please don't hesitate to let me know if there's anything else I can do to help find a fix!
FYI, tried downgrading to torch.version.cuda==10.2
, but this did not fix the issue (i.e., bash zinc.sh
still hangs). Will try cu110
.
Hi @skye95git, I have encountered the same behavior; i.e., there is no response at the console after running
bash zinc.sh
and interrupting the script produces a similarFile "<frozen importlib._bootstrap>", line 107, in acquire
error message. However, runningpip uninstall setuptools
did not fix the issue; rather, runningbash zinc.sh
now throwsterminate called after throwing an instance of 'std::bad_alloc'
(which is fixed by reinstallingsetuptools
). May you please advise - how did you successfully run the ZINC training example? Thanks.
Hi, this problem std::bad_alloc
occurred when I was running V1.0. If you want to run bash zinc.sh
in master, first you have to make sure your environment(Python, CUDA, PyTorch, and PyTorle-Geometric) is configured correctly:
python
>>> import torch
>>> torch.__version__
'1.11.0'
>>> torch.version.cuda
'11.3'
>>> torch.cuda.is_available()
True
python
from torch_geometric.data import DataLoader
If the above steps are ok, uninstall setuptools pip uninstall setuptools
before running bash zinc.sh
.
This works for me. Hope it works for you.
The bug seems to be caused by ogb version upgrade. Now it should be fixed by #114 . Please have a try. @skye95git @ayushnoori
I am having the same issue, ' Segmentation fault' when running bash zinc.sh, pip uninstall setuptools
did not fix the issue, any ideas?
Hi @doloresgarcia , i have encountered the same problem. Have you figured out how to solve this problem?
I install Graphormer follow the guide: 1) To create and activate a conda environment with Python3.9
2) Run the following commands
3) To train a Graphormer-slim on ZINC-500K on a single GPU card
I ran this command for half an hour and got no response:
Occasional reactions occur, but the following error is reported:
I try to train a Graphormer-base on PCQM4M dataset on multiple GPU cards using
bash pcqv1.sh
, no response either. Is there a problem with the data set download? How to solve the problem?