Saving LapPE instead of precomputing every run

rish-16 commented 2 years ago

Hello Ladislav!

You may remember me from the LoGaG talk! Thanks for the session :D

I have been playing around with the GraphGPS configs and realised the LapPE precomputing process takes place from scratch every time I run the pcqm4m-GPS.yaml config with main.py. Would it be possible to add a patch that saves this pre-computed information locally so I can quickly access it without having to run the same operation again?

I'm benchmarking such models for my research so I'll be putting it up for training regularly, so was hoping to find ways to avoid the precomputing every time.

Appreciate your consideration, enjoyed reading the paper!

rampasek commented 2 years ago

Hi Rishabh!

Yes that is possible, but would need a little bit of refactoring:

PyG Datasets have pre_transform hook, see the doc, which applies a transformation function to each graph example in the dataset and saves that version to the disk, keeping that version cached so it is only computed once -- the first time the dataset is downloaded and processed.
The way I precompute the PE/SE stats is already implemented as transformation function applied to each dataset, but I do so after loading the "vanilla" dataset from the disk, having to recompute it each time as you mention. But this could be hooked up to the pre_transform hook and cached.
In order to load the correct dataset with the desired precomputed PE/SE you need to rename the dataset accordingly, so it is saved under a unique name and make sure that if the PE/SE config matches what was used to precompute the cached version.

So that is roughly what needs to be done. I haven't implemented yet as it was impractical for me to have many different cached versions of the datasets (I'm a bit storage limited). I don't immediately plan to do it now, hopefully later on.

rish-16 commented 2 years ago

That's a great suggestion, thanks! I'll look into it :D

Appreciate the help!

rampasek / GraphGPS

Saving LapPE instead of precomputing every run #7