original datasets ? - Githubissues

kimiyoung / planetoid

Semi-supervised learning with graph embeddings

MIT License

882 stars 298 forks source link

original datasets ? #5

Open Chunpai opened 6 years ago

Chunpai commented 6 years ago

Hello, could you please provide the original dataset before your preprocessing ? It seems it does not match the dataset in the following link https://linqs.soe.ucsc.edu/data. It seems your train/val/test splits are well chosen. Some labels are also different from the original dataset in https://linqs.soe.ucsc.edu/data. Do you have any ideas about this ? Thanks.

ghost commented 6 years ago

Hi, I have the same question too. Could you please give me some advice?

Chunpai commented 6 years ago

No. If you are working on something related to GCN, you probably need to just use the random splits version.

monk1337 commented 6 years ago

Hi, I am working on this but confused on dataset. There is no clear explanation how to convert original dataset for gcn. Can you provide raw data preprocessing code files? or instructions how to do that?

Thank you !

ghost commented 5 years ago

Hi, I have the same question too. Could you please provide me raw data preprocessing code files?

daiquanyu commented 5 years ago

Hey, anybody has solved this problem. Could you share your code with me? Many thanks...

Davidlihuang commented 5 years ago

Hi ,I try to write the code to create the dateset like yours but seems somting wrong. I use the dataset X = x_train+x_val+x_test (L =L_train+L_val+L_test ) to create the graph(X.shape[0]*X.shape[0]) am I wrong? do you have any idea about this problem,if so could you tell me,thank you very much!

o0windseed0o commented 5 years ago

Hello, could you please provide the original dataset before your preprocessing ? It seems it does not match the dataset in the following link https://linqs.soe.ucsc.edu/data. It seems your train/val/test splits are well chosen. Some labels are also different from the original dataset in https://linqs.soe.ucsc.edu/data. Do you have any ideas about this ? Thanks.

@Chunpai I met the same issue when regenerating the data, cannot get such high performance by random selection. Have you solved your issue?

readergy commented 3 years ago

Hi, I have the same question too. Could you please provide me raw data preprocessing code files?

readergy commented 3 years ago

Hello, could you please provide the original dataset before your preprocessing ? It seems it does not match the dataset in the following link https://linqs.soe.ucsc.edu/data. It seems your train/val/test splits are well chosen. Some labels are also different from the original dataset in https://linqs.soe.ucsc.edu/data. Do you have any ideas about this ? Thanks.

@Chunpai I met the same issue when regenerating the data, cannot get such high performance by random selection. Have you solved your issue?

Excuse me, have you found the solution？

andrew-korea commented 1 year ago

Hi, I am working on this but confused on dataset. There is no clear explanation how to convert original dataset for gcn. Can you provide raw data preprocessing code files? or instructions how to do that?

Thank you !

The original dataset (http://www.cs.umd.edu/~sen/lbc-proj/LBC.html) is processed using Pickle (https://github.com/NIRVANALAN/gcn_analysis/blob/master/notebook/Plantenoid%20Citation%20Data%20Format%20Transformation.ipynb).