Closed migalkin closed 7 months ago
Mh, thanks for bringing this up. Can you do me a favor and check if https://github.com/pyg-team/pytorch_geometric/pull/9079 resolves your issues?
Yes, that fixes the jit issue!
The downside is that the inference is still 2x slower than in 2.4, with minimal changes to the code (replacing inspector.distribute()
from older versions to inspector.collect_param_data()
).
2.4 version:
Perhaps there are other moving parts involved into that and updated in the newer version, hard to say more without profiling
Can you share some information on how I can benchmark this?
Sure, I created the pyg2.5
branch in the repo: https://github.com/DeepGraphLearning/ULTRA/tree/pyg2.5
From there, I run this:
python script/run.py -c config/transductive/inference.yaml --dataset FB15k237 --epochs 0 --bpe null --gpus null --ckpt /<your path to the repos>/ULTRA/ckpts/ultra_4g.pth --bs 64
and look at the tqdm stats
Thanks. Will try to reproduce, and create PyG 2.5.3 afterwards.
Thanks. I looked into this. I cannot spot any performance degradation within MessagePassing
, but your generalized rspmm kernel is a lot slower, probably because of changed feature dimension.
main
: output of generalized rspmm is torch.Size([14541, 512])pyg2.5
: output of generalized rspmm is torch.Size([14541, 4096])Ah the default batch size in the main
branch in config/transductive/inference.yaml
is 8 instead of 64 (hence the flattened shape is num_nodes x 64 x batch_size = 512
) - could you please try with batch size 64 in main
?
Mh, for me both models take equally long to run (relation_model around 0.16s and entity_model around 3.51s), and tqdm
outputs equal runtime as well.
Can you double check on your end? What does
torch.cuda.synchronize()
t = time.perf_counter()
t_pred = model(test_data, t_batch)
h_pred = model(test_data, h_batch)
torch.cuda.synchronize()
print(time.perf_counter() - t)
return for you in both versions?
Confirm, on GPUs both versions take pretty much the same time (running on GPUs is anyways the most standard setup):
main
pyg2.5
The slowdown is then probably due to some problems with newer pytorch versions on M1/M2 - but that's definitely out of scope of the current issue.
TL;DR for future readers: The issue with custom propagate
was fixed perfectly fine and PyG 2.5 does not affect performance on GPUs π
Thanks for looking into this!
π
π Describe the bug
Setup: running ULTRA and, in particular, the Generalized Relational Convolution with a custom
rspmm
cuda/cpu kernel works flawlessly with PyTorch 2.1 and PyG 2.4.Updating the env to torch 2.2.1 and PyG 2.5.0 / 2.5.1 results in the JIT compilation not taking into account the custom
propagate
function implemented in the layer. I see the compiled layer file in~/.cache/pyg/message_passing/
generated from the originalpropagate
function fromMessagePassing
and it never invokes the custompropagate
function.With that, a lot of other errors arise, such that missing
index
anddim_size
kwargs for the aggregate function that are originally collected byself._collect
.Besides, even after explicitly defining all the necessary kwargs in the
self.propagate
call, the inference time on a standard fb15k237 dataset increases from 3 sec to 180 sec (on M2 Max laptop) π . I was wondering about a few questions therefore:propagate
function in the layer?Thanks for looking into that!
P.S. PyG 2.5.0 compiles layers into
~/.cache/pyg/message_passing
while 2.5.1 compiles into/var/folder/<some_gibberish>
- is it ok?Versions