sezinata / MANE

Multi-View Collaborative Network Embedding
7 stars 5 forks source link

About data processing #2

Closed Dangwei-dw closed 1 year ago

Dangwei-dw commented 1 year ago

Hi, Can you provide the code for data preprocessing? The ID or name of the node in the dataset

Dangwei-dw commented 1 year ago

PPI and Go

sezinata commented 1 year ago

Hi, For PPI and GO dataset, before constructing the graph, data represented by node names (they look like indices as they are integers but they represent node names). Explained it here Readme: https://github.com/sezinata/MANE/tree/master/data/test_data/Alzheimer Conversion of them (after constructing the PPI and GO graphs) to indices is required.

From Node Classification (without attention) line numbers 248 249 node2idx = {n: idx for (idx, n) in enumerate(common_nodes)} ##used for initial mapping to indices idx2node = {idx: n for (idx, n) in enumerate(common_nodes)} ## used in after the model converting back to node names

MANE works on common nodes, this is the reason of this conversion.

Dangwei-dw commented 1 year ago

I mean, the ID or name of the node in the database, like the ID of the protein in IntAct using Uniprot ID 'uniprotkb:P49418', 'uniprotkb:O43426',

sezinata commented 1 year ago

Here is the link https://github.com/sezinata/SurveyDGP for more info on dataset IntAct: "Recent advances in network-based methods for disease gene prediction"

Dangwei-dw commented 1 year ago

Thanks a lot!