williamleif / GraphSAGE

Representation learning on large graphs using stochastic graph convolutions.
Other
3.42k stars 840 forks source link

A question about the ppi dataset #78

Open DinikaSen opened 5 years ago

DinikaSen commented 5 years ago

I would like to know what is the exact dataset used to generate the toy_ppi dataset in example_data folder (input files used to generate the pre-processed dataset). What are the node features available in the dataset? May I know the link where the dataset is available?

yashu88 commented 5 years ago

I think you can check https://downloads.thebiogrid.org/BioGRID to find source.

preetham-salehundam commented 5 years ago

@DinikaSen did you get neccessary info from the URL mentioned by @yashu88 ?

RexYing commented 5 years ago

For raw data, http://snap.stanford.edu/ohmnet/ has the graph structure; http://software.broadinstitute.org/gsea/msigdb/collections.jsp has the feature and label information. c1, c3, c7 are the feature sets; GO is the label set.

zch42 commented 4 years ago

I have a question on the feature sets. As the dimensions of c1, c3, c7 are very large, how did you represent each protein using 50-dimensional vectors?

Thanks in advance!

knightXun commented 4 years ago

Could you open your PPI data preprocessing code? @RexYing

Sutongtong233 commented 2 years ago

I have a question on the feature sets. As the dimensions of c1, c3, c7 are very large, how did you represent each protein using 50-dimensional vectors?

Thanks in advance!

I have the same question.