Can't run on Nell dataset

liqimai commented 7 years ago

You reported the performance of GCN on Nell. I notice that you used data provided by Yang. I download Nell from Yang's GitHub https://github.com/kimiyoung/planetoid. But when I run your program on Nell, it runs into a runtime error:

"utils.py", line 51, in load_data 
    features[test_idx_reorder, :] = features[test_idx_range, :]
ValueError: row index 9897 out of bounds

It seems that it is reordering the test data points, in order to keep consistent with adjacency matrix, but some indices are out of bounds.

The full stacktrace:

$ python train.py --dataset nell.0.01
Traceback (most recent call last):
  File "train.py", line 29, in <module>
    adj, features, y_train, y_val, y_test, train_mask, val_mask, test_mask = load_data(FLAGS.dataset)
  File "/Users/liqimai/anaconda3/lib/python3.5/site-packages/gcn-1.0-py3.5.egg/gcn/utils.py", line 51, in load_data
    features[test_idx_reorder, :] = features[test_idx_range, :]
  File "/Users/liqimai/anaconda3/lib/python3.5/site-packages/scipy/sparse/lil.py", line 289, in __getitem__
    return self._get_row_ranges(i, j)
  File "/Users/liqimai/anaconda3/lib/python3.5/site-packages/scipy/sparse/lil.py", line 329, in _get_row_ranges
    j_start, j_stop, j_stride, nj)
  File "scipy/sparse/_csparsetools.pyx", line 787, in scipy.sparse._csparsetools.lil_get_row_ranges (scipy/sparse/_csparsetools.c:11978)
ValueError: row index 9897 out of bounds

tkipf commented 7 years ago

This is the code snippet that I used to pre-process the NELL dataset from Zhilin Yang's GitHub:

    if dataset == 'nell.0.001':
        # Find relation nodes, add them as zero-vecs into the right position
        test_idx_range_full = range(allx.shape[0], len(graph))
        isolated_node_idx = np.setdiff1d(test_idx_range_full, test_idx_reorder)
        tx_extended = sp.lil_matrix((len(test_idx_range_full), x.shape[1]))
        tx_extended[test_idx_range-allx.shape[0], :] = tx
        tx = tx_extended
        ty_extended = np.zeros((len(test_idx_range_full), y.shape[1]))
        ty_extended[test_idx_range-allx.shape[0], :] = ty
        ty = ty_extended

        features = sp.vstack((allx, tx)).tolil()
        features[test_idx_reorder, :] = features[test_idx_range, :]

        idx_all = np.setdiff1d(range(len(graph)), isolated_node_idx)

        if not os.path.isfile("data/planetoid/{}.features.npz".format(dataset)):
            print("Creating feature vectors for relations - this might take a while...")
            features_extended = sp.hstack((features, sp.lil_matrix((features.shape[0], len(isolated_node_idx)))),
                                          dtype=np.int32).todense()
            features_extended[isolated_node_idx, features.shape[1]:] = np.eye(len(isolated_node_idx))
            features = sp.csr_matrix(features_extended)
            print("Done!")
            save_sparse_csr("data/planetoid/{}.features".format(dataset), features)
        else:
            features = load_sparse_csr("data/planetoid/{}.features.npz".format(dataset))

        adj = nx.adjacency_matrix(nx.from_dict_of_lists(graph))

Fingers crossed that it still works (haven't tested this in quite some time).

liqimai commented 7 years ago

I can run it now but the test accuracy is only 45%. The configuration I used is:

'dataset'       : 'nell.0.001',
'model'         : 'gcn',
'learning_rate' : 0.01,
'epochs'        : 200,
''hidden1'      : 64,
'dropout'       : 0.1,
'weight_decay'  : 1e-5,
'early_stopping': 10,

The weight_decay, dropout, and hidden units are the same as you used in your paper. Other configurations are the same as the default setting of your code. But the result is far from what you reported. Did I make something wrong? How can I reproduce your results?

tkipf commented 7 years ago

Your configuration looks correct. Did you manage to reproduce the results for the other datasets? Just making sure there is no other underlying issue. Does the training/validation error converge at all or does it get stuck at some low value? I can have a more detailed look into this issue early next week. On Tue, Sep 5, 2017 at 11:27 PM Liqimai notifications@github.com wrote:

I can run it now but the test accuracy is only 45%. The configuration I used is:

'dataset' : 'nell.0.001', 'model' : 'gcn', 'learning_rate' : 0.01, 'epochs' : 200, ''hidden1' : 64, 'dropout' : 0.1, 'weight_decay' : 1e-5, 'early_stopping': 10,

The weight_decay, dropout, and hidden units are the same as you used in your paper. Other configurations are the same as the default setting of your code. But the result is far from what you reported. Did I make something wrong? How can I reproduce your results?

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/tkipf/gcn/issues/14#issuecomment-327387671, or mute the thread https://github.com/notifications/unsubscribe-auth/AHAcYBKFesiE-NdFy457lEpx99XCwHi2ks5sfjtFgaJpZM4PDu8Z .

liqimai commented 7 years ago

I forked your code and modified a lot. My modified version only has 45% accuracy. When I used your original program with above code snippet, it turned out about 60% accuracy, very close to the accuracy under random split you reported in the paper. I think there must be something wrong in my modified version. Thanks a lot!

Put another code snippet here which I think you used in preprocessing nell, if anyone need it:

def save_sparse_csr(filename,array):
    np.savez(filename,data = array.data ,indices=array.indices,
             indptr =array.indptr, shape=array.shape )

def load_sparse_csr(filename):
    loader = np.load(filename)
    return csr_matrix((  loader['data'], loader['indices'], loader['indptr']),
                         shape = loader['shape'])

ZacheryGuan commented 6 years ago

I've been doing the same tests on NELL dataset these days. Using the code snippets above, I got about 45% acc if label rate=0.1% (using nell.0.001, as the Table 1-dataset statistics given in your paper). However, the model can hit about 60% when the label increases to 1% (using nell.0.01).

I've also tried run more epochs on the given dataset, but using thousandth labeled can only hit about 50%, and then stops at 250 epochs. So maybe the result given in the paper uses dataset with one-percent labeled?

tkipf commented 6 years ago

Have you tried the hyperparameter settings described in the paper? Note that these are different for NELL than for the other datasets.

ZacheryGuan commented 6 years ago

Yes. I used different hyperparams, as described in the paper:

'dataset' : 'nell.0.001',
'model' : 'gcn',
'learning_rate' : 0.01,
'epochs' : 200,
'hidden1' : 64,
'dropout' : 0.1,
'weight_decay' : 1e-5,
'early_stopping': 10.

And I have reproduced the results of other datasets, cora, citeceer and pubmed.

tkipf commented 6 years ago

Thanks for testing. I'll have a look at it as soon as I find time for it.

For now, I would recommend having a look at a better-suited model for relational datasets like this. We recently had a paper on this: https://arxiv.org/abs/1703.06103. This should give you better and more consistent results for directed graphs with different relation types. The NELL dataset as you're using it now is preprocessed to be an undirected graph without edge types, so that the GCN model can be trained on it. You can find the original NELL dataset here: http://rtw.ml.cmu.edu/rtw/resources

tkipf commented 6 years ago

Maybe try python 2.7? Which version are you on?

rajeshneti98 commented 6 years ago

Hi Thomas, While going through the code I feel I found a small bug comparing the code with the paper. In the paper, It was mentioned that you have used one training example class. Since there are 210 classes, it makes it 210 training examples needed to run the code but from the nell.0.001 dataset downloaded from planetoid, there are just 105 training examples. Should I consider it a bug or if you have used some other dataset other than that, please provide the link for the dataset.

tkipf commented 6 years ago

This seems to be a bug (I have used the dataset which you refer to)- I will look into it. Thanks for reporting! On Mon 18. Jun 2018 at 13:03 Rajeshneti notifications@github.com wrote:

Hi Thomas, While going through the code I feel I found a small bug comparing the code with the paper. In the paper, It was mentioned that you have used one training example class. Since there are 210 classes, it makes it 210 training examples needed to run the code but from the nell.0.001 dataset downloaded from planetoid, there are just 105 training examples. Should I consider it a bug or if you have used some other dataset other than that, please provide the link for the dataset.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/tkipf/gcn/issues/14#issuecomment-398018619, or mute the thread https://github.com/notifications/unsubscribe-auth/AHAcYMM8oo6qJvJs9dWRYeU3CJgrcXl_ks5t94kMgaJpZM4PDu8Z .

zhongboyin commented 6 years ago

Hi, Thanks for your sharing. I am trying to testing on NELL dataset. I have ready download the NELL dataset and add your snippet in utils.py file. Here comes to the following error (nell.0.01 and 0.001):

Creating feature vectors for relations - this might take a while... Done! Traceback (most recent call last): File "train.py", line 32, in features = preprocess_features(features) File "/home/bdsirs/yzb/experiment/gcn-master/gcn/utils.py", line 156, in preprocess_features r_inv = np.power(rowsum, -1).flatten() ValueError: Integers to negative integer powers are not allowed.

tkipf commented 6 years ago

Convert the adjacency matrix to float and then it should work ;-) On Wed 25. Jul 2018 at 06:25 zhongboyin notifications@github.com wrote:

Hi, Thanks for your sharing. I am trying to testing on NELL dataset. I have ready download the NELL dataset and add your snippet in utils.py file. Here comes to the following error (nell.0.01 and 0.001):

Creating feature vectors for relations - this might take a while... Done!

Traceback (most recent call last):

File "train.py", line 32, in features = preprocess_features(features) File "/home/bdsirs/yzb/experiment/gcn-master/gcn/utils.py", line 156, in preprocess_features r_inv = np.power(rowsum, -1).flatten() ValueError: Integers to negative integer powers are not allowed.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/tkipf/gcn/issues/14#issuecomment-407637178, or mute the thread https://github.com/notifications/unsubscribe-auth/AHAcYLbFaEZsbk45I-DQNDsmEIF3yqZbks5uKAEsgaJpZM4PDu8Z .

zxj32 commented 5 years ago

Yes. I used different hyperparams, as described in the paper:

'dataset' : 'nell.0.001',

'model' : 'gcn',

'learning_rate' : 0.01,

'epochs' : 200,

'hidden1' : 64,

'dropout' : 0.1,

'weight_decay' : 1e-5,

'early_stopping': 10.

And I have reproduced the results of other datasets, cora, citeceer and pubmed.

Hi, Did you reproduced the result on NELL dataset so far?

tkipf commented 5 years ago

This should work:

rowsum = np.array(rowsum, dtype=np.float32)

On Wed 13. Mar 2019 at 07:48 HanYuanyuaner notifications@github.com wrote:

Hi, I met the same issue as @zhongboyin https://github.com/zhongboyin Thanks for your sharing. I am trying to testing on NELL dataset. I have ready download the NELL dataset and add your snippet in utils.py file. Here comes to the following error (nell.0.01 and 0.001):

Traceback (most recent call last): File "train.py", line 33, in features = preprocess_features(features) File "/home/nishome/hyuan/model/gcn/gcn/utils.py", line 153, in preprocess_features r_inv = np.power(rowsum, -1).flatten() ValueError: Integers to negative integer powers are not allowed.

@tkipf https://github.com/tkipf I am not clear about your answer "Convert the adjacency matrix to float and then it should work". Which line I should modify?

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/tkipf/gcn/issues/14#issuecomment-472301040, or mute the thread https://github.com/notifications/unsubscribe-auth/AHAcYHPlCASO2oQ6g6Okb0HhEvt4ZVG6ks5vWJ87gaJpZM4PDu8Z .

linzhongping commented 5 years ago

Hello, I noticed there were only part of nodes affiliated with features in NELL data set. Would you mind to share the entire NELL data set with me?

tkipf commented 5 years ago

NELL in its entirety is available here: http://rtw.ml.cmu.edu/rtw/

On Mon, May 6, 2019 at 8:35 AM 林中平 notifications@github.com wrote:

Hello, I noticed there were only part of nodes affiliated with features in NELL data set. Would you mind to share the entire NELL data set with me?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tkipf/gcn/issues/14#issuecomment-489513889, or mute the thread https://github.com/notifications/unsubscribe-auth/ABYBYYEOQLCXK435NGRNQP3PT7GTVANCNFSM4DYO54MQ .

tonyandsunny commented 5 years ago

Hi! I found that this code

    if not os.path.isfile("data/planetoid/{}.features.npz".format(dataset_str)):
        print("Creating feature vectors for relations - this might take a while...")
        features_extended = sp.hstack((features, sp.lil_matrix((features.shape[0], len(isolated_node_idx)))),
                                      dtype=np.int32).todense()
        features_extended[isolated_node_idx, features.shape[1]:] = np.eye(len(isolated_node_idx))
        features = sp.csr_matrix(features_extended)
        print("Done!")
        save_sparse_csr("data/planetoid/{}.features".format(dataset_str), features)
    else:
        features = load_sparse_csr("data/planetoid/{}.features.npz".format(dataset_str))

may cause the features dimenstion is 65755*61278. However, the paper point out the nell dataset features dimention is 5414 Then I try to remove this code, and check the dimention.The features dimention is the same as the paper. Should I use this code? Or remove it?

tkipf commented 5 years ago

We augment node features with unique one-hot vectors (i.e., featurizing the identity of a node), which results in this shape of the feature matrix. I recommend leaving that in :)

On Sun, Aug 18, 2019 at 10:30 AM tonyandsunny notifications@github.com wrote:

Hi! I found that this code if not os.path.isfile("data/planetoid/{}.features.npz".format(dataset_str)): print("Creating feature vectors for relations - this might take a while...") features_extended = sp.hstack((features, sp.lil_matrix((features.shape[0], len(isolated_node_idx)))), dtype=np.int32).todense() features_extended[isolated_node_idx, features.shape[1]:] = np.eye(len(isolated_node_idx)) features = sp.csr_matrix(features_extended) print("Done!") save_sparse_csr("data/planetoid/{}.features".format(dataset_str), features) else: features = load_sparse_csr("data/planetoid/{}.features.npz".format(dataset_str)) may cause the features dimenstion is 65755*61278. However, the paper point out the nell dataset features dimention is 5414 Then I try to remove this code, and check the dimention.The features dimention is the same as the paper. Should I use this code? Or remove it?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tkipf/gcn/issues/14?email_source=notifications&email_token=ABYBYYDLKV5ZMPDNXMYV7MDQFECA3A5CNFSM4DYO54M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4Q3GFA#issuecomment-522302228, or mute the thread https://github.com/notifications/unsubscribe-auth/ABYBYYBALLIOKRPFG2K3KZ3QFECA3ANCNFSM4DYO54MQ .

tonyandsunny commented 5 years ago

Thanks a lot! But when I run this code (without remove it ), the memory it needed is 15GB. And this is too large for my devices. Could you give me some suggestions ?

tkipf commented 5 years ago

Maybe you can try running it on CPU? 15 GB of RAM should be fine for most machines.

On Tue, Aug 20, 2019 at 10:44 AM tonyandsunny notifications@github.com wrote:

Thanks a lot! But when I run this code (without remove it ), the memory it need is 15GB. And this is too large to my devices. Could you give me some suggestions ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tkipf/gcn/issues/14?email_source=notifications&email_token=ABYBYYD5PX2DEPDT5G3RBO3QFOVINA5CNFSM4DYO54M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4VRNPQ#issuecomment-522917566, or mute the thread https://github.com/notifications/unsubscribe-auth/ABYBYYAYIGQEKQGEJM3PANDQFOVINANCNFSM4DYO54MQ .

sudiptamondal08 commented 4 years ago

Hi Thomas,

I am using your code snippet to pre-process the NELL dataset from Zhilin Yang's GitHub. But I am getting the following error:

Creating feature vectors for relations - this might take a while... Traceback (most recent call last):

File "", line 1, in runfile('C:/Users/sudip/Downloads/Compressed/GAT-master/execute_cora.py', wdir='C:/Users/sudip/Downloads/Compressed/GAT-master')

File "C:\Users\sudip\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile execfile(filename, namespace)

File "C:\Users\sudip\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile exec(compile(f.read(), filename, 'exec'), namespace)

File "C:/Users/sudip/Downloads/Compressed/GAT-master/execute_cora.py", line 38, in adj, features, y_train, y_val, y_test, train_mask, val_mask, test_mask = process.load_data(dataset)

File "C:\Users\sudip\Downloads\Compressed\GAT-master\utils\process.py", line 113, in load_data features_extended[isolated_node_idx, features.shape[1]:] = np.eye(len(isolated_node_idx))

File "C:\Users\sudip\Anaconda3\lib\site-packages\numpy\lib\twodim_base.py", line 201, in eye m = zeros((N, M), dtype=dtype, order=order)

MemoryError

Could you please help me with this issue.

fansariadeh commented 4 years ago

I can run it now but the test accuracy is only 45%. The configuration I used is:
'dataset'       : 'nell.0.001',
'model'         : 'gcn',
'learning_rate' : 0.01,
'epochs'        : 200,
''hidden1'      : 64,
'dropout'       : 0.1,
'weight_decay'  : 1e-5,
'early_stopping': 10,
The weight_decay, dropout, and hidden units are the same as you used in your paper. Other configurations are the same as the default setting of your code. But the result is far from what you reported. Did I make something wrong? How can I reproduce your results?

Yes. I used different hyperparams, as described in the paper:

'dataset' : 'nell.0.001',

'model' : 'gcn',

'learning_rate' : 0.01,

'epochs' : 200,

'hidden1' : 64,

'dropout' : 0.1,

'weight_decay' : 1e-5,

'early_stopping': 10.

And I have reproduced the results of other datasets, cora, citeceer and pubmed.

fansariadeh commented 4 years ago

Hi there,

I got the same issue with preprocessing NELL dataset. I need to run the same code of Kipf with this all data sets mentioned in his paper. He mentioned the code snippet for preprocessing data but where should of add it to the code to run? Thank you in advance. Cheers, Fatima

sudiptamondal08 commented 4 years ago

As far as I remember I added it in the file used for graph processing in the utils folder.

Best, Dipto

On Tue, Jun 30, 2020 at 2:15 AM fansariadeh notifications@github.com wrote:

Hi there,

I got the same issue with preprocessing NELL dataset. I need to run the same code of Kipf with this all data sets mentioned in his paper. He mentioned the code snippet for preprocessing data but where should of add it to the code to run? Thank you in advance. Cheers, Fatima

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tkipf/gcn/issues/14#issuecomment-651565915, or unsubscribe https://github.com/notifications/unsubscribe-auth/APA76JD72K7MEX77ASSX7HDRZF7H7ANCNFSM4DYO54MQ .

fansariadeh commented 4 years ago

As far as I remember I added it in the file used for graph processing in the utils folder. Best, Dipto … On Tue, Jun 30, 2020 at 2:15 AM fansariadeh @.***> wrote: Hi there, I got the same issue with preprocessing NELL dataset. I need to run the same code of Kipf with this all data sets mentioned in his paper. He mentioned the code snippet for preprocessing data but where should of add it to the code to run? Thank you in advance. Cheers, Fatima — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#14 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/APA76JD72K7MEX77ASSX7HDRZF7H7ANCNFSM4DYO54MQ .

Dear Dipto, Thanks a lot for your help. I knew where to attach it but still it does not work. Can you please take a glance of my code? I am about to start my phd and very nervous about it.

fansariadeh commented 4 years ago

As far as I remember I added it in the file used for graph processing in the utils folder. Best, Dipto … On Tue, Jun 30, 2020 at 2:15 AM fansariadeh @.***> wrote: Hi there, I got the same issue with preprocessing NELL dataset. I need to run the same code of Kipf with this all data sets mentioned in his paper. He mentioned the code snippet for preprocessing data but where should of add it to the code to run? Thank you in advance. Cheers, Fatima — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#14 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/APA76JD72K7MEX77ASSX7HDRZF7H7ANCNFSM4DYO54MQ .

Loading nell.0.001 dataset... ../data/NELL/nell_datanell.0.001.nell Traceback (most recent call last): File "train.py", line 45, in A_hat, adj, B_hat, adjCAT, features, labels, idx_train, idx_val, idx_test = load_data() File "C:\Users\Fatima\Anaconda3\lib\site-packages\pygcn-0.1-py3.7.egg\pygcn\utils.py", line 24, in load_data test_idx_range_full = range(allx.shape[0], len(graph)) NameError: name 'allx' is not defined

fansariadeh commented 4 years ago

As far as I remember I added it in the file used for graph processing in the utils folder. Best, Dipto … On Tue, Jun 30, 2020 at 2:15 AM fansariadeh @.***> wrote: Hi there, I got the same issue with preprocessing NELL dataset. I need to run the same code of Kipf with this all data sets mentioned in his paper. He mentioned the code snippet for preprocessing data but where should of add it to the code to run? Thank you in advance. Cheers, Fatima — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#14 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/APA76JD72K7MEX77ASSX7HDRZF7H7ANCNFSM4DYO54MQ .

my code cannot find or read allx

sudiptamondal08 commented 4 years ago

You need to download the dataset in the same format they have for core, citesser and pubmed.

On Tue, Jun 30, 2020 at 3:27 AM fansariadeh notifications@github.com wrote:

As far as I remember I added it in the file used for graph processing in the utils folder. Best, Dipto … <#m-3829056484467800935> On Tue, Jun 30, 2020 at 2:15 AM fansariadeh @.***> wrote: Hi there, I got the same issue with preprocessing NELL dataset. I need to run the same code of Kipf with this all data sets mentioned in his paper. He mentioned the code snippet for preprocessing data but where should of add it to the code to run? Thank you in advance. Cheers, Fatima — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#14 (comment) https://github.com/tkipf/gcn/issues/14#issuecomment-651565915>, or unsubscribe https://github.com/notifications/unsubscribe-auth/APA76JD72K7MEX77ASSX7HDRZF7H7ANCNFSM4DYO54MQ .

my code cannot find or read allx

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tkipf/gcn/issues/14#issuecomment-651603352, or unsubscribe https://github.com/notifications/unsubscribe-auth/APA76JEBXEMVAABJBIDMCVLRZGHW3ANCNFSM4DYO54MQ .

fansariadeh commented 4 years ago

You need to download the dataset in the same format they have for core, citesser and pubmed. On Tue, Jun 30, 2020 at 3:27 AM fansariadeh notifications@github.com wrote: … As far as I remember I added it in the file used for graph processing in the utils folder. Best, Dipto … <#m-3829056484467800935> On Tue, Jun 30, 2020 at 2:15 AM fansariadeh @.***> wrote: Hi there, I got the same issue with preprocessing NELL dataset. I need to run the same code of Kipf with this all data sets mentioned in his paper. He mentioned the code snippet for preprocessing data but where should of add it to the code to run? Thank you in advance. Cheers, Fatima — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#14 (comment) <#14 (comment)>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/APA76JD72K7MEX77ASSX7HDRZF7H7ANCNFSM4DYO54MQ . my code cannot find or read allx — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#14 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/APA76JEBXEMVAABJBIDMCVLRZGHW3ANCNFSM4DYO54MQ .

I did it actually, and saved them in the same place. Is there any other changes I should make?

fansariadeh commented 4 years ago

I do not understand this line. There is not any features.npz in the dataset

if not os.path.isfile("data/planetoid/{}.features.npz".format(dataset)): print("Creating feature vectors for relations - this might take a while...")

Would be great if you help me out.

fansariadeh commented 4 years ago

This is the code snippet that I used to pre-process the NELL dataset from Zhilin Yang's GitHub:

    if dataset == 'nell.0.001':
        # Find relation nodes, add them as zero-vecs into the right position
        test_idx_range_full = range(allx.shape[0], len(graph))
        isolated_node_idx = np.setdiff1d(test_idx_range_full, test_idx_reorder)
        tx_extended = sp.lil_matrix((len(test_idx_range_full), x.shape[1]))
        tx_extended[test_idx_range-allx.shape[0], :] = tx
        tx = tx_extended
        ty_extended = np.zeros((len(test_idx_range_full), y.shape[1]))
        ty_extended[test_idx_range-allx.shape[0], :] = ty
        ty = ty_extended

        features = sp.vstack((allx, tx)).tolil()
        features[test_idx_reorder, :] = features[test_idx_range, :]

        idx_all = np.setdiff1d(range(len(graph)), isolated_node_idx)

        if not os.path.isfile("data/planetoid/{}.features.npz".format(dataset)):
            print("Creating feature vectors for relations - this might take a while...")
            features_extended = sp.hstack((features, sp.lil_matrix((features.shape[0], len(isolated_node_idx)))),
                                          dtype=np.int32).todense()
            features_extended[isolated_node_idx, features.shape[1]:] = np.eye(len(isolated_node_idx))
            features = sp.csr_matrix(features_extended)
            print("Done!")
            save_sparse_csr("data/planetoid/{}.features".format(dataset), features)
        else:
            features = load_sparse_csr("data/planetoid/{}.features.npz".format(dataset))

        adj = nx.adjacency_matrix(nx.from_dict_of_lists(graph))

Fingers crossed that it still works (haven't tested this in quite some time).

Dear Thomas,

I appreciate your informative work. After copying nell_data.tar and run the code I faced this error:

File "train.py", line 45, in A_hat, adj, B_hat, adjCAT, features, labels, idx_train, idx_val, idx_test = load_data() File "C:\Users\Fatima\Anaconda3\lib\site-packages\pygcn-0.1-py3.7.egg\pygcn\utils.py", line 24, in load_data test_idx_range_full = range(allx.shape[0], len(graph)) NameError: name 'allx' is not defined

Could you please let me know how to run the code?

fansariadeh commented 4 years ago

Hi, Thanks for your sharing. I am trying to testing on NELL dataset. I have ready download the NELL dataset and add your snippet in utils.py file. Here comes to the following error (nell.0.01 and 0.001):

Creating feature vectors for relations - this might take a while... Done! Traceback (most recent call last): File "train.py", line 32, in features = preprocess_features(features) File "/home/bdsirs/yzb/experiment/gcn-master/gcn/utils.py", line 156, in preprocess_features r_inv = np.power(rowsum, -1).flatten() ValueError: Integers to negative integer powers are not allowed. I did the same and got this error: Loading nell.0.001 dataset... ../data/NELL/nell_datanell.0.001.nell Traceback (most recent call last): File "train.py", line 45, in A_hat, adj, B_hat, adjCAT, features, labels, idx_train, idx_val, idx_test = load_data() File "C:\Users\Fatima\Anaconda3\lib\site-packages\pygcn-0.1-py3.7.egg\pygcn\utils.py", line 24, in load_data test_idx_range_full = range(allx.shape[0], len(graph)) NameError: name 'allx' is not defined Do you have any idea why is like that?

fansariadeh commented 4 years ago

We augment node features with unique one-hot vectors (i.e., featurizing the identity of a node), which results in this shape of the feature matrix. I recommend leaving that in :) … On Sun, Aug 18, 2019 at 10:30 AM tonyandsunny @.**> wrote: Hi! I found that this code if not os.path.isfile("data/planetoid/{}.features.npz".format(dataset_str)): print("Creating feature vectors for relations - this might take a while...") features_extended = sp.hstack((features, sp.lil_matrix((features.shape[0], len(isolated_node_idx)))), dtype=np.int32).todense() features_extended[isolated_node_idx, features.shape[1]:] = np.eye(len(isolated_node_idx)) features = sp.csr_matrix(features_extended) print("Done!") save_sparse_csr("data/planetoid/{}.features".format(dataset_str), features) else: features = load_sparse_csr("data/planetoid/{}.features.npz".format(dataset_str)) may cause the features dimenstion is 6575561278. However, the paper point out the nell dataset features dimention is 5414 Then I try to remove this code, and check the dimention.The features dimention is the same as the paper. Should I use this code? Or remove it? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#14?email_source=notifications&email_token=ABYBYYDLKV5ZMPDNXMYV7MDQFECA3A5CNFSM4DYO54M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4Q3GFA#issuecomment-522302228>, or mute the thread https://github.com/notifications/unsubscribe-auth/ABYBYYBALLIOKRPFG2K3KZ3QFECA3ANCNFSM4DYO54MQ .

Dear Thomas, I know you are too busy and your time is precious but still would be great if you provided the code snippet you used to use the feed Citeseer or Pubmed instead of cora to this code. Would be great hearing from you.

Sincerely, Fatima

fansariadeh commented 4 years ago

This is the code snippet that I used to pre-process the NELL dataset from Zhilin Yang's GitHub:

    if dataset == 'nell.0.001':
        # Find relation nodes, add them as zero-vecs into the right position
        test_idx_range_full = range(allx.shape[0], len(graph))
        isolated_node_idx = np.setdiff1d(test_idx_range_full, test_idx_reorder)
        tx_extended = sp.lil_matrix((len(test_idx_range_full), x.shape[1]))
        tx_extended[test_idx_range-allx.shape[0], :] = tx
        tx = tx_extended
        ty_extended = np.zeros((len(test_idx_range_full), y.shape[1]))
        ty_extended[test_idx_range-allx.shape[0], :] = ty
        ty = ty_extended

        features = sp.vstack((allx, tx)).tolil()
        features[test_idx_reorder, :] = features[test_idx_range, :]

        idx_all = np.setdiff1d(range(len(graph)), isolated_node_idx)

        if not os.path.isfile("data/planetoid/{}.features.npz".format(dataset)):
            print("Creating feature vectors for relations - this might take a while...")
            features_extended = sp.hstack((features, sp.lil_matrix((features.shape[0], len(isolated_node_idx)))),
                                          dtype=np.int32).todense()
            features_extended[isolated_node_idx, features.shape[1]:] = np.eye(len(isolated_node_idx))
            features = sp.csr_matrix(features_extended)
            print("Done!")
            save_sparse_csr("data/planetoid/{}.features".format(dataset), features)
        else:
            features = load_sparse_csr("data/planetoid/{}.features.npz".format(dataset))

        adj = nx.adjacency_matrix(nx.from_dict_of_lists(graph))

Fingers crossed that it still works (haven't tested this in quite some time).

Would be wonderful if explain a bit more regarding the datasets provided in https://github.com/kimiyoung/planetoid/commit/221ebe0236984018f23fcb7b039708ea4d45bfd4?diff=split These are .ally files while we need nodes and edge information. Can you please share the snippet for using PubMed and Citeseer as well.

It is really appreciated. Cheers

fansariadeh commented 4 years ago

I can run it now but the test accuracy is only 45%. The configuration I used is:
'dataset'       : 'nell.0.001',
'model'         : 'gcn',
'learning_rate' : 0.01,
'epochs'        : 200,
''hidden1'      : 64,
'dropout'       : 0.1,
'weight_decay'  : 1e-5,
'early_stopping': 10,
The weight_decay, dropout, and hidden units are the same as you used in your paper. Other configurations are the same as the default setting of your code. But the result is far from what you reported. Did I make something wrong? How can I reproduce your results?

I can run it now but the test accuracy is only 45%. The configuration I used is:
'dataset'       : 'nell.0.001',
'model'         : 'gcn',
'learning_rate' : 0.01,
'epochs'        : 200,
''hidden1'      : 64,
'dropout'       : 0.1,
'weight_decay'  : 1e-5,
'early_stopping': 10,
The weight_decay, dropout, and hidden units are the same as you used in your paper. Other configurations are the same as the default setting of your code. But the result is far from what you reported. Did I make something wrong? How can I reproduce your results?

Dear

As far as I remember I added it in the file used for graph processing in the utils folder. Best, Dipto … On Tue, Jun 30, 2020 at 2:15 AM fansariadeh @.***> wrote: Hi there, I got the same issue with preprocessing NELL dataset. I need to run the same code of Kipf with this all data sets mentioned in his paper. He mentioned the code snippet for preprocessing data but where should of add it to the code to run? Thank you in advance. Cheers, Fatima — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#14 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/APA76JD72K7MEX77ASSX7HDRZF7H7ANCNFSM4DYO54MQ .

Dear

You need to download the dataset in the same format they have for core, citesser and pubmed. On Tue, Jun 30, 2020 at 3:27 AM fansariadeh notifications@github.com wrote: … As far as I remember I added it in the file used for graph processing in the utils folder. Best, Dipto … <#m-3829056484467800935> On Tue, Jun 30, 2020 at 2:15 AM fansariadeh @.***> wrote: Hi there, I got the same issue with preprocessing NELL dataset. I need to run the same code of Kipf with this all data sets mentioned in his paper. He mentioned the code snippet for preprocessing data but where should of add it to the code to run? Thank you in advance. Cheers, Fatima — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#14 (comment) <#14 (comment)>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/APA76JD72K7MEX77ASSX7HDRZF7H7ANCNFSM4DYO54MQ . my code cannot find or read allx — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#14 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/APA76JEBXEMVAABJBIDMCVLRZGHW3ANCNFSM4DYO54MQ .

Dear Dipto,

Can you please let me know how to pre-process Nell dataset or directly download a Nell dataset which is prepared to be used in a structure compatibel to this code. I mean graph and edges in two separate files like .contect and .cites

fansariadeh commented 4 years ago

load_sparse_csr

Hi Dr Kipf,

Could you please share the definition of "load_sparse_csr" function here? Thank you in advance.

Cheers, Fatima

fansariadeh commented 4 years ago

Hi, Thanks for your sharing. I am trying to testing on NELL dataset. I have ready download the NELL dataset and add your snippet in utils.py file. Here comes to the following error (nell.0.01 and 0.001):

Creating feature vectors for relations - this might take a while... Done! Traceback (most recent call last): File "train.py", line 32, in features = preprocess_features(features) File "/home/bdsirs/yzb/experiment/gcn-master/gcn/utils.py", line 156, in preprocess_features r_inv = np.power(rowsum, -1).flatten() ValueError: Integers to negative integer powers are not allowed.

Hi mate! I am struggling with nell and need to have the definition of "load_sparse_csr" function in the snippet Kipf shared here. Would be great if you share it with me.

Thank you in advance. Cheers, Fatima

fansariadeh commented 4 years ago

I can run it now but the test accuracy is only 45%. The configuration I used is:
'dataset'       : 'nell.0.001',
'model'         : 'gcn',
'learning_rate' : 0.01,
'epochs'        : 200,
''hidden1'      : 64,
'dropout'       : 0.1,
'weight_decay'  : 1e-5,
'early_stopping': 10,
The weight_decay, dropout, and hidden units are the same as you used in your paper. Other configurations are the same as the default setting of your code. But the result is far from what you reported. Did I make something wrong? How can I reproduce your results?

Hi there, I really need your help regarding Nell. Can you please share the parts you changed to make Nell run?

tkipf / gcn

Can't run on Nell dataset #14