Question) Why use invertible Checkpoint Module in ogb_eff/ogbn_protiens

lightaime / deep_gcns_torch

Pytorch Repo for DeepGCNs (ICCV'2019 Oral, TPAMI'2021), DeeperGCN (arXiv'2020) and GNN1000(ICML'2021): https://www.deepgcns.org

MIT License

1.13k stars 155 forks source link

Question) Why use invertible Checkpoint Module in ogb_eff/ogbn_protiens #89

Closed jiyoonlim123 closed 2 years ago

jiyoonlim123 commented 2 years ago

According to 'Training Graph Neural Networks with 1000 Layers', checkpoint consumes more memory than RevConv. However, there is an invertible checkpoint on your code.

Is it correct that invertible checkpoint does the work of 'torch.utils.checkpoint' so that it is compatible with RevConv? If yes, is there any reason for adding this feature even checkpoint consumes more memory than Reversible connection?

lightaime commented 2 years ago

Hi @jiyoonlim123. 'torch.utils.checkpoint' is not used in ogb_eff/ogbn_protiens. Sorry that I forgot to clean up the unused code snippets. I just cleaned it up again to make it more clear.

jiyoonlim123 commented 2 years ago

Thanks for quick reply.

Then InvertibleModuleWrapper and `InvertibleCheckpointFunction' are being used in the current code. These two modules are used to store output of the layer. Then it would store's |L| outputs. However, if use reversible connection, we can compute inputs from outputs of last layer. Thus, it just needs O(1) outputs. Am I misunderstand the code behavior? or Is there are any reasons that store every output of layer?

lightaime commented 2 years ago

The memory of node features in every layers are free: https://github.com/lightaime/deep_gcns_torch/blob/1b840faed83363098587eacac111a6317a927195/eff_gcn_modules/rev/gcn_revop.py#L58. The other inputs (adj and edge features) are saved but they do not change across layers.

jiyoonlim123 commented 2 years ago

I understand, but what I'm considering is the output of layers. Don't we only need the output value of last layer not the output value of every layer? Since, every layer stores the output value, this model's memory consumption would be O(L).

lightaime commented 2 years ago

Hi @jiyoonlim123. Do you see the memory consumption increases linearly as you increase the number of layers?