pyg-team / pytorch_geometric

Graph Neural Network Library for PyTorch
https://pyg.org
MIT License
20.62k stars 3.58k forks source link

some thoughts about pyg #684

Open WMF1997 opened 4 years ago

WMF1997 commented 4 years ago

🚀 Feature

0. More comments to encourage us DIY.

1. torch_geometric.datasets.TUDataset's "once and for all"

2. Still about torch_geometric.datasets: arrangement.

3. torch_geometric.contrib (or, pyg_contrib)

4. torch_geometric.io (I have mentioned it)

5. functional support

6. torch_geometric.visualization

Motivation

I have some thoughts about PyTorch Geometric, I write down all my thoughts about pyg here. Perhaps some of the features is not needed, but I thought that . I like(love) the library, and that is the only reason for I write the long feature request.
Perhaps it can be a roadmap of pyg.

1. torch_geometric.datasets.TUDataset's "once and for all"

First, many thanks to the share of the datasets! image

I marked All Data Sets. Downloading one-by-one is really takes a long time. With enough hard-disk compacity, why not do that once and for all?

one-click update TUDatasets

  1. check the datasets downloaded locally.
  2. compare with the site's datasets
  3. download and extract the rest.

2. Still about torch_geometric.datasets: arrangement.

Geometric is really a big concept: any graphs can be okay: Citation Graphs(Cora), Molecules(QM9), Point Clouds(ModelNet), even Knowledge Graphs(`DBP15K)...

Now, with only torch_geometric.datasets.DBP15K, a green horn(just like me) cannot know what it is. So, IN MY OPINION, I think it might be better to distinguish the datasets, with different usage. For example, ModelNet can be represented as: torch_geometric.datasets.pointcloud.ModelNet and so on.

Appendix: comparison about torchvision.datasets

As the official extension of pytorch, torchvision can be a reference of our repo. Since torchvision is focusing on problems on images, and the datasets is really well-known to nearly all people who is involved in Deep Learning, then torchvision.datasets do not extinguish the datasets. (for example, even MNIST is [1, 28, 28] and CIFAR10 is [3, 32, 32], with different number of channels. (Here, I use \[C, H, W\] to represent the shape.

3. torch_geometric.contrib (or, pyg_contrib)

As we can see, feature requeset is really a hard thing. Sometimes, the requesters do have the ability to add it. however, (perhaps at most time, i think), we just mention it.
What's more, new ideas can be infinity, and we cannot push all the ideas and their implementations into master branch. So... Why not have a contrib, like TensorFlow.

what i think about contrib

for example, graph densenet mentioned in DeepGCNs: Can GCNs Go as Deep as CNNs? is really a good idea in pointnet segmentation. And the author opened the code (PyTorch Geometric implementation) in GitHub.

Here, I think a general steps of using pyg_contrib: (take his repo(code) for example):

graph densenet
  1. his github repo(code) -> pyg_contrib (or, feature request: prototype code -> pyg_contrib), -> denotes push
  2. discussed and modified (to make it much better) in pyg_contrib, by EVERYONE WHO WANTS TO INVOLVED WITH IT. Of course, a roadmap, or, a kanban is really needed here. (kanban is provided by github)
  3. if it is really good, or, really needs to be maintained , add it to pyg; if not, remove(deprecate) it from pyg_contrib.

(added in 2019.09.25)

pyg_contrib.datasets wiki dataset, and linqs dataset

wiki dataset linqs dataset (datasets provided by LINQS group) https://linqs.soe.ucsc.edu/data and there are some datasets about social relationships. I think this can be a good example to contrib.

conclusion of pyg_contrib

As mentioned before, new thoughts can be infinity. And contrib can never include all datasets. What PyG can do is to set a standard, giving some examples, and implement some of the frequently-used algorithms (for example, GCN).

datasets written in tutorial only has the base_class's code, without an implementation, or, an example of "how to DIY".

externel resources provided by Steeve Huang is a good PyG tutorial, but... I just feel that only with 2 jupyter notebooks of just "using" PyG (as mentioned in his readme.md) perhaps... (And of course, device also counts: DL on graphs can be a little easier, compared with DL on Images. 2-layers'GCN network can run relatively-fast on node classification on Cora Dataset, only with an Intel Core i7-3540M. With Intel Core i7-8700, Core i7-8750M, and with GPU, it can be much much faster. (Point Cloud mission do need GPU...) I think that most of the code in tutorial can be run on CPU (fast).

4. torch_geometric.io (I have mentioned it)

I have mentioned that. read and write the files (especially point cloud files, .ply, .off files)

5. functional support, i.e. torch_geometric.nn.functional

mentioned in a previous issue. we can use functional to create(or, to test) nearly all kinds of structures. (most time, for fun).
for example, initialization can be tested. (although as we all know, kaiming_uniform can be a good choice when the input is an image, but...), and I know that reset_parameters can be a solution when the parameters needs to be modified. but i do not think it is that convinent. If a weight is assigned, and just use x, edge_index and weight to compute, like that in torch.nn.functional.conv2d, it can be really a nice thing.

6. torch_geometric.visualization

visualization is really a big job. NOT ONLY the curves, t-SNE, ... GRAPH itself should be considered. A colormap can show us the importance of each node. (color the node with colormap, just like heatmap in image(feature map)), why not in visdom? ( I know that matplotlib's plots can be viewed in visdom, and we can use networkx.draw() to plot a graph, so... it might be possible to use visdom (I do not do deep research and test, just show the possibility of using visdom)

example and code. example: image

code:

import numpy as np
import matplotlib.pyplot as plt
import networkx as nx
import visdom as vis

g = nx.karate_club_graph()
fig = plt.figure()
nx.draw_circular(g, with_labels=True, node_color='#66CCFF')  # NOTE(wmf): you can write anything you like.  

vis_env = vis.Visdom()
vis_env.matplot(fig)  # sorry, only this works... 

What? TensorBoard? I think that Tensorboard is not that suitable for visualizing GRAPHs, although visualizing curves, and t-SNE is really really cool in TensorBoard.

Additional context

No. (If I think of something more, I will go on with the issue)

Yours Sincerely, MingFei Wang. (@wmf1997) 2019.09.16 22:11 (UTC+8) Tianjin, China

Added in 2019.09.17 11:30 (UTC+8):

0. More comments to encourage us DIY.

First, Thank you for your work again~! (PyG is a good architecture of Graph Representation Learning~!)

Reading source code can also be a good way of studying~ I mean, reading the implementions of Graph Neural Networks, for example, read MessagePassing ?Abstract? Base Class can let me know what message passing is in GNN, and GCNConv can let me know the derived class (implementations in detail) of GNN.
However, IN MY OPINION, codes without enough comments might make people confused (after they read the article). (For example, GCNConv, in authors'(kipf & welling) origin pytorch implemention, uses sparse matrix multiplication, (as the formula written in the article. however, in pyg, your implementation uses MessagePassing. and I know the reason from rrl_gnn.pdf. the reason, i.e. How to change sparse matmul into message passing, should be written. With this method, I think more methods can be implemented or re-implemented by pyg. )

rusty1s commented 4 years ago

Thank you, this is an awesome list. We can discuss this in more detail after ICLR deadline :)

WMF1997 commented 4 years ago

hello @rusty1s first, many thanks for your GREAT library.

what's more, Graph DenseNet is not my article. I just pick it for an example. (in fact, i have not run the repo's code provided by the authors).

... WOW! ICLR! You are so great! yours sincerely, @WMF1997

rusty1s commented 4 years ago

Why closing this?

rusty1s commented 4 years ago

Hi @WMF1997 and thank you for sharing your ideas:

  1. Once and for all TUDataset: Personally, I am not a big fan of this. The TUDataset website gets updated regularly, and in general users do not need all provided datasets. In addition, it permits the usage of the pre_transform argument because one may want to change it for different datasets. IMO, if one really wants to download all datasets, it should be no problem to write a wrapper function.

  2. Dataset arrangement: This is a good point. The list of provided datasets is becoming quite confusing. We could separate them into spatial, temporal, knowledge, and so on.

  3. .contrib: I personally really like this idea. I will try to add it to the repository.

  4. .io: Yes, we should rename the .read subpackage and also provide writing capabilites.

  5. .functional: We had functional support in earlier releases, but the resulting code has become quite a mess. Personally, I see no real use case in this interface (even in PyTorch). Everything you described can also be done via torch.nn.Module by overriding reset_parameters or manually assigning weight values.

  6. .visualization: This is a big one. Although I like to provide visualization capabilities, I do not want to add too much dependencies to the package. In the end, PyG is a deep learning library, and it should be clean and easily comprehensible. In addition, it is questionable which external visualization packages we should support. So, is there any problem with converting your data to networkx and use its visualization capabilities? (related to https://github.com/rusty1s/pytorch_geometric/issues/683)

tchaton commented 4 years ago

Hey @rusty1s and @WMF1997,

I think it could be great also to have LazyMessagePassing implemetation using https://github.com/getkeops/keops (maybe, it could also be used for scatter).

Check request: https://github.com/rusty1s/pytorch_geometric/issues/689 I think having a more efficient scatter method would be great too.

Ideally: Message passing will be a symbolic / Lazy operations and the scatter would be redundancy free. It could provide great speed up while scaling to huge graph.

Best, Thomas Chaton.

rusty1s commented 4 years ago

Hi @tchaton,

I am aware of the keops library and find it really useful, however, I do not see how it can replace the scatter call used in the MessagePassing operators. Do you have an example to showcase this?

tchaton commented 4 years ago

Hey @rusty1s, For now, I am just doing a bit of research to figure what are the options to improve pytorch geometric and make it more attractive for people. I think scaling and speed are the principal factors now as the API is already extremely simple.

I don't think the current MessagePassing implementation scales well (at least not in my case). I think a Lazy implementation, maybe relying on keops for x_i and x_j could be extremely interesting.

# Turn our Tensors into KeOps symbolic variables:
from pykeops.torch import LazyTensor
x_i = LazyTensor( x[:,None,:] )  # x_i.shape = (1e6, 1, 3)
y_j = LazyTensor( y[None,:,:] )  # y_j.shape = ( 1, 2e6,3)

# We can now perform large-scale computations, without memory overflows:
D_ij = ((x_i - y_j)**2).sum(dim=2)  # Symbolic (1e6,2e6,1) matrix of squared distances
K_ij = (- D_ij).exp()               # Symbolic (1e6,2e6,1) Gaussian kernel matrix

# And come back to vanilla PyTorch Tensors or NumPy arrays using
# reduction operations such as .sum(), .logsumexp() or .argmin().
# Here, the kernel density estimation   a_i = sum_j exp(-|x_i-y_j|^2)
# is computed using a CUDA online map-reduce routine that has a linear
# memory footprint and outperforms standard PyTorch implementations
# by two orders of magnitude.
a_i = K_ij.sum(dim=1)  # Genuine torch.cuda.FloatTensor, a_i.shape = (1e6, 1),
g_x = torch.autograd.grad((a_i ** 2).sum(), [x])  # KeOps supports autograd!

It could be LazyMessagePassing with this kind of pseudo code

x = LazyTensor(x) x_i = Lazy_Select(x, edge_index(i)) x_j = Lazy_Select(x, edge_index(j))

out = self.message(x_i, x_j) # Arguments and output are symbolics scatter(out, edge_index) where scatter is actually a map reduce operation (On top of that, maybe HAG implementation could be used in the scatter method).

I would prevent to load N=2 times edge_index size * features size on the gpu.

I have asked people from Keops: https://github.com/getkeops/keops/issues/26

There is also: https://github.com/facebookresearch/PyTorch-BigGraph which scales to huge graph even if I don' like the interface. Capture

I still don't know what would be the best option to do so. Maybe just implement HAG in pytorch scatter. But I want to work on that for my side project as I think it would great for the community.

What are you thoughts ?

Best, T.C

rusty1s commented 4 years ago

I haven‘t looked into the HAG and PyTorch BigGraph options that closely yet, but your provided keops example performs dense reductions instead of sparse ones. As far as I can see, keops does not provide any options for sparse reductions. It is nonetheless a great library and I am eager to heavily integrate it in a new major release. If keops can provide a way for sparse reductions, I am the first one to improve the MessagePassing interface with it.

Many other options come with the downsides of heavy pre-processing costs. In addition, I see no way how to implement operators that integrate edge features in a more memory efficient way. For all other operators, one can usually default to sparse matrix multiplications - we could enhance our MessagePassing interface to do so, but it would require us to check the implementation of the message function.

tchaton commented 4 years ago

@rusty1s, Here is the answer from Jean Feady on this thread. From https://github.com/getkeops/keops/issues/26

Hi @tchaton,

Thanks a lot for the links and references!
Two quick thoughts:

    Coincidence, I just implemented a scattered_sum reduction for issue #24 (still have to implement the gradient and some doc + unit tests before merging in master). This thread is a good illustration of the way the current framework can be extended to support sparse-like problems. If you'd like to make a similar feature requests, with specific "toy examples" to optimize, please feel free to do so. Note however that the KeOps schemes will always be suited to full and block-sparse schemes, with loads of interactions to compute and nearly-contiguous memory accesses. I don't think we'll ever be able to be competitive on "Twitter-like" graph structures, with super-sparse and random connectivities.

    Our short-term priority has been to diffuse KeOps in the Kernel and Gaussian Process communities: we're mathematicians by trade, and their problems are super close to what we know as we share a common language, etc. That's why we've put a lot of effort into kernel-centric tutorials, R backends and JMLR/AiStats-like publications. However, I just got hired as a PostDoc for three years in Michael Bronstein's team (at Imperial College, we're I'm sitting right now), where all the students use PyTorch_Geometric: in the next few months/years, extending KeOps in a direction that is suited to Geometric/Graph deep learning will be my top priority.

Note that I will be pretty busy with paper writing / redaction until ~December: I'll be happy to read what you and @rusty1s can send to me on the subject, but probably won't be able to implement anything seriously before January :-)

Best regards, Jean

jlevy44 commented 4 years ago

@rusty1s maybe building in plotly support for the visualizations? I've made some nice looking 3-d graphs using the software.

Also @WMF1997 @rusty1s maybe also nice to include a pytorch geometric scikit-learn type interface (eg. #240 ) These could go into contrib.

WMF1997 commented 4 years ago

hello @rusty1s and all the people who involved in this issue:

1 once_and_for_all's download

I think your thought is right. once_and_for_all's TUDataset indeed has little use. What I think at first is to make users download all the datasets. In fact, this is just a python script's work.

2 3 4 -> (.datasets in detail, .contrib support, .io support)

I have no more thoughts to add.

5 .functional support

You are right. .functional really has that need?

I have known that using reset_parameters() is a good solution, and deriving (hacking) Message Passing base class is the reason why it grows up.

As far as I know, the reason why pytorch have to need the functional support is that its torch.nn.Module's forward method uses Functional, to be more exactly, pytorch's Functional is written in .cpp, or .cu. (I manage to reference Functional in pytorch to write Python code in pyg, however, i find no .py files in functional .

6 .visualization support

Yes, you are right. What your words want to tell us (at least, me) is that:

1 what we focus is not visualizing better, and the only thing we focus is computing better. 2 one repo, one thing. Bigger Repo <=/=> Better Use 3 from research to production is shown in the https://pytorch.org. What we focus in pyg is research more than production (Although GNN is developing fast in 2019 and will develop faster in early 2020s, but the production of GNN still has some time to wait.)

yours sincerely, @WMF1997

WhatAShot commented 4 years ago

Visualization is really important.

lightaime commented 4 years ago

Thanks @WMF1997 and @rusty1s for the discussion on adding our DeepGCNs project to .contrib. I am happy to help with this if needed.

Hafplo commented 3 years ago

@rusty1s @WMF1997 Regarding pyg.io ,

  1. Can we add some more documentation (examples? tests / sample files?) to it?

  2. There are a lot of different file formats out there, I don't think it's reasonable to support all of them. I understand that Data() objects are the way to go, but perhaps we can define a file format for "pyg graphs" (it needs to be general enough, yet flexible and compressed)? If we have a unified file format interface, it will simplify the reading and writing and parsing (just move the 'pain' to creating those files in the first place). But since every dataset need to be saved somehow, somewhere, it means only 1 person needs to do the dataset conversion and upload it online.

As an example: For my current dataset, I define 3 CSV files (for node features, edge index and edge features), as well as collect some metadata for each new graph. I think it is general enough to capture all types of graphs. I don't know if it is 'compressed' enough. Maybe it needs to allow using only numbers (remove string features using some encoding before saving it).