Closed vthost closed 1 year ago
Profiling seems to indicate that the time is spent in
queues.py
/resource_sharer.py
/connection.py
Thanks for sharing the finding. Given that the performance gap is huge, I've started trying to reproduce this on my side to catch all possible causes.
Here're some follow-up questions for repro:
pretrain_contextpred.py
?wget https://raw.githubusercontent.com/pytorch/pytorch/main/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py
Also, when you get a chance, it'd be nice if you could try reducing num_workers
and see whether the performance improves.
Thank you for directly getting back to me!
pretrain_contextpred.py
However, I now wanted to create a minimal environment for you to reproduce it. I used this for the PyG 2.2.0 configuration before but not my larger one for PyG 2.3.0. When testing the latter now, it actually worked as fast. So it seems to be another package interfering. I am posting my full environment in the very end below, after the output of the pytorch script. In case you have any idea where it could come from. I'll also check if I find out more.
Collecting environment information...
PyTorch version: 1.13.1
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A
OS: Red Hat Enterprise Linux release 8.8 (Ootpa) (x86_64)
GCC version: (GCC) 8.5.0 20210514 (Red Hat 8.5.0-18)
Clang version: Could not collect
CMake version: version 3.20.2
Libc version: glibc-2.28
Python version: 3.8.16 (default, Mar 2 2023, 03:21:46) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-4.18.0-477.15.1.el8_8.x86_64-x86_64-with-glibc2.17
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB
Nvidia driver version: 535.54.03
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 1
Core(s) per socket: 64
Socket(s): 2
NUMA node(s): 2
Vendor ID: AuthenticAMD
CPU family: 23
Model: 49
Model name: AMD EPYC 7742 64-Core Processor
Stepping: 0
CPU MHz: 3292.669
CPU max MHz: 2250.0000
CPU min MHz: 1500.0000
BogoMIPS: 4500.36
Virtualization: AMD-V
L1d cache: 32K
L1i cache: 32K
L2 cache: 512K
L3 cache: 16384K
NUMA node0 CPU(s): 0-63
NUMA node1 CPU(s): 64-127
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es
Versions of relevant libraries:
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.24.3
[pip3] torch==1.13.1
[pip3] torch-geometric==2.3.0
[pip3] torch-scatter==2.1.1+pt113cu117
[pip3] torch-sparse==0.6.17+pt113cu117
[pip3] torch-spline-conv==1.2.2+pt113cu117
[conda] blas 1.0 mkl
[conda] mkl 2023.1.0 h6d00ec8_46342
[conda] mkl-service 2.4.0 py38h5eee18b_1
[conda] mkl_fft 1.3.6 py38h417a72b_1
[conda] mkl_random 1.2.2 py38h417a72b_1
[conda] numpy 1.24.3 py38hf6e8229_1
[conda] numpy-base 1.24.3 py38h060ed82_1
[conda] pyg 2.3.0 py38_torch_1.13.0_cu117 pyg
[conda] pytorch 1.13.1 py3.8_cuda11.7_cudnn8.5.0_0 pytorch
[conda] pytorch-cuda 11.7 h778d358_5 pytorch
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] torch-scatter 2.1.1+pt113cu117 pypi_0 pypi
[conda] torch-sparse 0.6.17+pt113cu117 pypi_0 pypi
[conda] torch-spline-conv 1.2.2+pt113cu117 pypi_0 pypi
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_kmp_llvm conda-forge
_py-xgboost-mutex 2.0 cpu_0 conda-forge
absl-py 1.4.0 pypi_0 pypi
appdirs 1.4.4 pyhd3eb1b0_0
array-record 0.2.0 pypi_0 pypi
astor 0.8.1 pypi_0 pypi
astunparse 1.6.3 pypi_0 pypi
autograd 1.5 pypi_0 pypi
autograd-gamma 0.5.0 pypi_0 pypi
blas 1.0 mkl
boost 1.78.0 py38h4e30db6_4 conda-forge
boost-cpp 1.78.0 h6582d0a_3 conda-forge
bottleneck 1.3.5 py38h7deecbd_0
brotli 1.0.9 h166bdaf_8 conda-forge
brotli-bin 1.0.9 h166bdaf_8 conda-forge
brotlipy 0.7.0 py38h27cfd23_1003
bzip2 1.0.8 h7f98852_4 conda-forge
ca-certificates 2023.7.22 hbcca054_0 conda-forge
cachetools 5.3.0 pypi_0 pypi
cairo 1.16.0 hbbf8b49_1016 conda-forge
cairocffi 1.5.1 pypi_0 pypi
cairosvg 2.5.2 pypi_0 pypi
certifi 2023.7.22 pyhd8ed1ab_0 conda-forge
cffi 1.15.1 py38h5eee18b_3
charset-normalizer 2.0.4 pyhd3eb1b0_0
click 8.1.3 pypi_0 pypi
cloudpickle 2.2.1 pypi_0 pypi
colorama 0.4.6 pypi_0 pypi
contextlib2 21.6.0 pypi_0 pypi
contourpy 1.0.7 py38hfbd4bf9_0 conda-forge
cryptography 39.0.1 py38h9ce1e76_0
cssselect2 0.7.0 pypi_0 pypi
cuda-cudart 11.7.99 0 nvidia
cuda-cupti 11.7.101 0 nvidia
cuda-libraries 11.7.1 0 nvidia
cuda-nvrtc 11.7.99 0 nvidia
cuda-nvtx 11.7.91 0 nvidia
cuda-runtime 11.7.1 0 nvidia
cycler 0.11.0 pyhd8ed1ab_0 conda-forge
cython 0.29.35 pypi_0 pypi
dask 2022.6.1 pypi_0 pypi
dataclasses 0.6 pypi_0 pypi
decorator 5.1.1 pypi_0 pypi
defusedxml 0.7.1 pypi_0 pypi
deprecated 1.2.13 pypi_0 pypi
descriptastorus 2.6.0 pypi_0 pypi
dgl 1.1.1 py38_0 dglteam
dm-sonnet 2.0.1 pypi_0 pypi
dm-tree 0.1.8 pypi_0 pypi
docstring-parser 0.15 pypi_0 pypi
etils 1.3.0 pypi_0 pypi
expat 2.5.0 hcb278e6_1 conda-forge
filelock 3.12.2 pypi_0 pypi
flatbuffers 1.12 pypi_0 pypi
font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge
font-ttf-inconsolata 3.000 h77eed37_0 conda-forge
font-ttf-source-code-pro 2.038 h77eed37_0 conda-forge
font-ttf-ubuntu 0.83 hab24e00_0 conda-forge
fontconfig 2.14.2 h14ed4e7_0 conda-forge
fonts-conda-ecosystem 1 0 conda-forge
fonts-conda-forge 1 0 conda-forge
fonttools 4.39.4 py38h01eb140_0 conda-forge
formulaic 0.6.1 pypi_0 pypi
freetype 2.12.1 hca18f0e_1 conda-forge
fsspec 2023.5.0 pypi_0 pypi
future 0.18.3 pypi_0 pypi
fuzzywuzzy 0.18.0 pypi_0 pypi
gast 0.4.0 pypi_0 pypi
gettext 0.21.1 h27087fc_0 conda-forge
gin-config 0.5.0 pypi_0 pypi
google-api-core 2.11.0 pypi_0 pypi
google-api-python-client 2.87.0 pypi_0 pypi
google-auth 2.18.1 pypi_0 pypi
google-auth-httplib2 0.1.0 pypi_0 pypi
google-auth-oauthlib 0.4.6 pypi_0 pypi
google-pasta 0.2.0 pypi_0 pypi
googleapis-common-protos 1.59.0 pypi_0 pypi
gpflow 2.5.2 pypi_0 pypi
graph-nets 1.1.0 pypi_0 pypi
graphlib-backport 1.0.3 pypi_0 pypi
greenlet 2.0.2 py38h17151c0_1 conda-forge
grpcio 1.54.2 pypi_0 pypi
h5py 3.8.0 pypi_0 pypi
hdbscan 0.8.27 pypi_0 pypi
httplib2 0.22.0 pypi_0 pypi
huggingface-hub 0.16.4 pypi_0 pypi
icu 72.1 hcb278e6_0 conda-forge
idna 3.4 py38h06a4308_0
importlib-metadata 6.6.0 pypi_0 pypi
importlib-resources 5.12.0 pyhd8ed1ab_0 conda-forge
importlib_resources 5.12.0 pyhd8ed1ab_0 conda-forge
intel-openmp 2023.1.0 hdb19cb5_46305
interface-meta 1.3.0 pypi_0 pypi
jinja2 3.1.2 py38h06a4308_0
joblib 1.1.1 py38h06a4308_0
kaggle 1.5.13 pypi_0 pypi
keras 2.9.0 pypi_0 pypi
keras-preprocessing 1.1.2 pypi_0 pypi
kiwisolver 1.4.4 py38h43d8883_1 conda-forge
lark 1.1.5 pypi_0 pypi
lazy-loader 0.2 pypi_0 pypi
lcms2 2.15 haa2dc70_1 conda-forge
ld_impl_linux-64 2.38 h1181459_1
lerc 4.0.0 h27087fc_0 conda-forge
libbrotlicommon 1.0.9 h166bdaf_8 conda-forge
libbrotlidec 1.0.9 h166bdaf_8 conda-forge
libbrotlienc 1.0.9 h166bdaf_8 conda-forge
libclang 16.0.0 pypi_0 pypi
libcublas 11.10.3.66 0 nvidia
libcufft 10.7.2.124 h4fbf590_0 nvidia
libcufile 1.6.1.9 0 nvidia
libcurand 10.3.2.106 0 nvidia
libcusolver 11.4.0.1 0 nvidia
libcusparse 11.7.4.91 0 nvidia
libdeflate 1.18 h0b41bf4_0 conda-forge
libexpat 2.5.0 hcb278e6_1 conda-forge
libffi 3.4.4 h6a678d5_0
libgcc-ng 12.2.0 h65d4601_19 conda-forge
libgfortran-ng 11.2.0 h00389a5_1
libgfortran5 11.2.0 h1234567_1
libglib 2.76.3 hebfc3b9_0 conda-forge
libiconv 1.17 h166bdaf_0 conda-forge
libjpeg-turbo 2.1.5.1 h0b41bf4_0 conda-forge
libnpp 11.7.4.75 0 nvidia
libnvjpeg 11.8.0.2 0 nvidia
libpng 1.6.39 h753d276_0 conda-forge
libprotobuf 3.20.3 he621ea3_0
libstdcxx-ng 12.2.0 h46fd767_19 conda-forge
libtiff 4.5.0 ha587672_6 conda-forge
libuuid 2.38.1 h0b41bf4_0 conda-forge
libwebp-base 1.3.0 h0b41bf4_0 conda-forge
libxcb 1.15 h0b41bf4_0 conda-forge
libxgboost 1.7.4 cpu_h6e95104_0 conda-forge
libzlib 1.2.13 h166bdaf_4 conda-forge
lifelines 0.27.7 pypi_0 pypi
littleutils 0.2.2 pypi_0 pypi
llvm-openmp 16.0.4 h4dfa4b3_0 conda-forge
llvmlite 0.40.0 pypi_0 pypi
locket 1.0.0 pypi_0 pypi
lxml 4.9.2 pypi_0 pypi
markdown 3.4.3 pypi_0 pypi
markupsafe 2.1.1 py38h7f8727e_0
matplotlib-base 3.7.1 py38hd6c3c57_0 conda-forge
mkl 2023.1.0 h6d00ec8_46342
mkl-service 2.4.0 py38h5eee18b_1
mkl_fft 1.3.6 py38h417a72b_1
mkl_random 1.2.2 py38h417a72b_1
ml-collections 0.1.1 pypi_0 pypi
mordredcommunity 2.0.2 pyhd8ed1ab_0 conda-forge
multipledispatch 0.6.0 pypi_0 pypi
munkres 1.1.4 pyh9f0ad1d_0 conda-forge
mypy-extensions 1.0.0 pypi_0 pypi
ncurses 6.4 h6a678d5_0
networkx 1.8.1 pypi_0 pypi
ngboost 0.3.12 pypi_0 pypi
numba 0.57.0 pypi_0 pypi
numexpr 2.8.4 py38hc78ab66_1
numpy 1.24.3 py38hf6e8229_1
numpy-base 1.24.3 py38h060ed82_1
oauth2client 4.1.3 pypi_0 pypi
oauthlib 3.2.2 pypi_0 pypi
ogb 1.3.6 pypi_0 pypi
opencv-python-headless 4.7.0.72 pypi_0 pypi
openjpeg 2.5.0 hfec8fc6_2 conda-forge
openssl 1.1.1v hd590300_0 conda-forge
opt-einsum 3.3.0 pypi_0 pypi
outdated 0.2.2 pypi_0 pypi
packaging 23.0 py38h06a4308_0
pandas 1.4.2 py38h295c915_0
pandas-flavor 0.5.0 pypi_0 pypi
partd 1.4.0 pypi_0 pypi
pcre2 10.40 hc3806b6_0 conda-forge
pillow 9.5.0 py38h885162f_1 conda-forge
pip 23.0.1 py38h06a4308_0
pixman 0.40.0 h36c2ea0_0 conda-forge
pooch 1.4.0 pyhd3eb1b0_0
portalocker 2.7.0 pypi_0 pypi
promise 2.3 pypi_0 pypi
protobuf 3.19.6 pypi_0 pypi
psutil 5.9.0 py38h5eee18b_0
pthread-stubs 0.4 h36c2ea0_1001 conda-forge
py-cpuinfo 9.0.0 pypi_0 pypi
py-xgboost 1.7.4 cpu_py38h66f0ec1_0 conda-forge
pyasn1 0.5.0 pypi_0 pypi
pyasn1-modules 0.3.0 pypi_0 pypi
pycairo 1.23.0 py38h190342e_0 conda-forge
pycocotools 2.0.6 pypi_0 pypi
pycparser 2.21 pyhd3eb1b0_0
pyg 2.3.0 py38_torch_1.13.0_cu117 pyg
pygcl 0.1.2 pypi_0 pypi
pynndescent 0.5.10 pypi_0 pypi
pyopenssl 23.0.0 py38h06a4308_0
pyparsing 3.0.9 py38h06a4308_0
pysocks 1.7.1 py38h06a4308_0
pytdc 0.4.1 pypi_0 pypi
python 3.8.16 h7a1cb2a_3
python-dateutil 2.8.2 pyhd3eb1b0_0
python-slugify 8.0.1 pypi_0 pypi
python_abi 3.8 2_cp38 conda-forge
pytorch 1.13.1 py3.8_cuda11.7_cudnn8.5.0_0 pytorch
pytorch-cuda 11.7 h778d358_5 pytorch
pytorch-mutex 1.0 cuda pytorch
pytz 2022.7 py38h06a4308_0
pyyaml 6.0 pypi_0 pypi
rdkit 2023.3.1 pypi_0 pypi
rdkit-pypi 2022.9.5 pypi_0 pypi
readline 8.2 h5eee18b_0
regex 2023.5.5 pypi_0 pypi
reportlab 3.6.13 py38h57c54bf_0 conda-forge
requests 2.29.0 py38h06a4308_0
requests-oauthlib 1.3.1 pypi_0 pypi
rsa 4.9 pypi_0 pypi
sacrebleu 2.3.1 pypi_0 pypi
scikit-learn 1.2.2 py38h6a678d5_1
scikit-multilearn 0.2.0 pypi_0 pypi
scipy 1.10.1 py38hf6e8229_1
seaborn 0.11.2 pypi_0 pypi
sentencepiece 0.1.99 pypi_0 pypi
seqeval 1.2.2 pypi_0 pypi
setuptools 66.0.0 py38h06a4308_0
six 1.16.0 pyhd3eb1b0_1
sonnet 0.1.6 pypi_0 pypi
sqlalchemy 1.4.46 py38h1de0b5d_0 conda-forge
sqlite 3.41.2 h5eee18b_0
tabulate 0.9.0 pypi_0 pypi
tbb 2021.8.0 hdb19cb5_0
tensorboard 2.9.1 pypi_0 pypi
tensorboard-data-server 0.6.1 pypi_0 pypi
tensorboard-plugin-wit 1.8.1 pypi_0 pypi
tensorboardx 2.2 pyhd3eb1b0_0
tensorflow 2.9.0 pypi_0 pypi
tensorflow-addons 0.20.0 pypi_0 pypi
tensorflow-datasets 4.9.0 pypi_0 pypi
tensorflow-estimator 2.9.0 pypi_0 pypi
tensorflow-hub 0.13.0 pypi_0 pypi
tensorflow-io-gcs-filesystem 0.32.0 pypi_0 pypi
tensorflow-metadata 1.13.0 pypi_0 pypi
tensorflow-model-optimization 0.7.4 pypi_0 pypi
tensorflow-probability 0.17.0 pypi_0 pypi
tensorflow-text 2.9.0 pypi_0 pypi
termcolor 2.3.0 pypi_0 pypi
text-unidecode 1.3 pypi_0 pypi
tf-models-official 2.7.1 pypi_0 pypi
tf-slim 1.1.0 pypi_0 pypi
threadpoolctl 2.2.0 pyh0d69192_0
tinycss2 1.2.1 pypi_0 pypi
tk 8.6.12 h1ccaba5_0
toml 0.10.2 pypi_0 pypi
toolz 0.12.0 pypi_0 pypi
torch-scatter 2.1.1+pt113cu117 pypi_0 pypi
torch-sparse 0.6.17+pt113cu117 pypi_0 pypi
torch-spline-conv 1.2.2+pt113cu117 pypi_0 pypi
tqdm 4.65.0 py38hb070fc8_0
typed-argument-parser 1.8.0 pypi_0 pypi
typeguard 2.13.3 pypi_0 pypi
typing-inspect 0.9.0 pypi_0 pypi
typing_extensions 4.5.0 py38h06a4308_0
umap-learn 0.5.1 pypi_0 pypi
unicodedata2 15.0.0 py38h0a891b7_0 conda-forge
uritemplate 4.1.1 pypi_0 pypi
urllib3 1.26.15 py38h06a4308_0
webencodings 0.5.1 pypi_0 pypi
werkzeug 2.3.4 pypi_0 pypi
wheel 0.38.4 py38h06a4308_0
wrapt 1.15.0 pypi_0 pypi
xarray 2023.1.0 pypi_0 pypi
xgboost 1.7.4 cpu_py38h66f0ec1_0 conda-forge
xorg-kbproto 1.0.7 h7f98852_1002 conda-forge
xorg-libice 1.0.10 h7f98852_0 conda-forge
xorg-libsm 1.2.3 hd9c2040_1000 conda-forge
xorg-libx11 1.8.4 h8ee46fc_1 conda-forge
xorg-libxau 1.0.11 hd590300_0 conda-forge
xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge
xorg-libxext 1.3.4 h0b41bf4_2 conda-forge
xorg-libxrender 0.9.10 h7f98852_1003 conda-forge
xorg-renderproto 0.11.1 h7f98852_1002 conda-forge
xorg-xextproto 7.3.0 h0b41bf4_1003 conda-forge
xorg-xproto 7.0.31 h7f98852_1007 conda-forge
xz 5.4.2 h5eee18b_0
zipp 3.15.0 pyhd8ed1ab_0 conda-forge
zlib 1.2.13 h166bdaf_4 conda-forge
zstd 1.5.2 h3eb15da_6 conda-forge
I ran the same script, and I see that 2.3.0
takes 110% time of 2.2.0
with versions of other libraries fixed, but not 500% which you originally posted in the description. I will still try to investigate the performance difference via #7795 to catch both past and future regressions, but I'm closing this issue as you mentioned that 2.3.0 runs as fast as 2.2.0.
Sorry for bothering you but the discussion helped!
It seems to be a problem with rdkit
.
I used the conda version instead of pip's rdkit-pypi
. The installation of the former replaces python by cpython, this may be the main problem, but I'll stop here with investigating.
_Originally commented in https://github.com/pyg-team/pytorch_geometric/issues/3398#issuecomment-1679409220_
😵 Describe the installation problem
I am trying to run SSL code (basically pretrain-gnns)
For the newer PyG (i.e., for both above), I just updated: In chem/model.py:
add_self_loops
toedge_index, _
propagate
In chem/loader.py
cat_dim
to__cat_dim__
I guess I am missing an important update. Profiling seems to indicate that the time is spent in
queues.py
/resource_sharer.py
/connection.py
Thank you in advance!
Environment
conda
,pip
, source): conda