yao8839836 / text_gcn

Graph Convolutional Networks for Text Classification. AAAI 2019
1.37k stars 437 forks source link

test data and mini-batch question #19

Open yeshenpy opened 5 years ago

yeshenpy commented 5 years ago

I found that the test data was also calculated into the adjacency matrix in the preprocessing stage, so if I have a sentence that does not appear in the data set at all, do I have to recalculate the adjacency matrix again and retrain the weights? Another question,can the mini batch data be used for this model? Just like other deep networks, I am new to GCN and hope to get your advice!!!

yao8839836 commented 5 years ago

Hi @yeshenpy , thanks for your question.

Yes, the current code could not make a prediction on brand new data without recalculating the adjacency matrix and retraining, because the GCN we use is transductive. We have pointed this out in "Discussion" of our paper. There are some inductive GCN variants which can make prediction on brand new data without retraining:

[1] Hamilton, W.; Ying, R.; and Leskovec, J. 2017. Inductive representation learning on large graphs. In NIPS, 1024–1034.

[2] Chen, J.; Ma, T.; and Xiao, C. 2018. Fastgcn: Fast learning with graph convolutional networks via importance sampling. In ICLR

We tried their code, but it seems they don't work well with one-hot features. We are also trying to solve this problem in our own way. A possible simple solution is building the graph without test docs (or even without all docs), when a new doc (say d_100) comes, we lookup word embeddings (for words in d_100) learned by GCN and do some pooling (mean, average, LSTM) to generate doc embeddings for d_100, we can then select the dimension with the highest value in the doc embedding as the prediction label.

The two works above also designed mini-batch training methods. Following link is the mini-batch version of the transductive GCN:

https://github.com/matenure/FastGCN/blob/master/pubmed-original_transductive_FastGCN.py

yeshenpy commented 5 years ago

Thank you very much for your reply. Because I just came into contact with GCN, I still lack understanding in some aspects.I'd like to ask a few questions which may have strayed from textGCN: (1)in the original GCN, when the number of nodes in the network is determined, as long as the rules of computing adj. are determined, we can use different datas, then calculate their adj. and then feed adj and feature to the network for training, and finally get the trained weight. Is this understanding correct? beacause i read another paper “Exploring Visual Relationshipfor Image Captioning” which uses the gcn to get realtions。

(2)If we use the word embeddings , so the input dimension of features is determined, and the dimensions of weights( we want to train ) are determined, is the number of nodes variable? for example, 100 datas ([100,300]) and the corresponding adj, and the next step ,we feed 200 datas for training?

Although these questions may be a little silly, I still hope to get your help, thank you very much

yao8839836 commented 5 years ago

@yeshenpy

(1) Yes, you are right. Different data sets have different adj and feature matrix, trained weights are also dataset specific.

(2) I use the notation in my paper here: suppose we use word embeddings and there are N nodes in a given dataset, and the first layer embeddings have 200 dimensions, then X is a (N 300) matrix, A~ is a (N N) matrix, W_0 is a (300 200) matrix, W_1 is a (200 number of classes) matrix.

yao8839836 commented 5 years ago

@yeshenpy

Hi, I have found an inductive manner to train Text GCN, which can make prediction on brand new data without retraining, I used a two layers​ approximation version of fastGCN [1]:

https://github.com/yao8839836/fast_text_gcn

This inductive GCN version also supports mini-batch. The test accuracy for 20NG is about 0.832 with learning rate = 0.001, rank0 =600, rank1 =600, lower than 0.8634 produced by our transductive Text GCN.

[1] Chen, J.; Ma, T.; and Xiao, C. 2018. Fastgcn: Fast learning with graph convolutional networks via importance sampling. In ICLR

yeshenpy commented 5 years ago

Thank you very much,I read several articles about it carefully ,and I have read the code of FastGCN, but I feel that this code is just to cut the adj and then sample the specified number of neighbor nodes, which is almost the same as GraphSAGE except that the form of sampling . Inductive forms are generally less effective than transductive forms,but get a more general result。 Transduction GCN don't need for a variety of data fitting in the form of parameters, which is a process from the specific to the specific , in the process,only need to fit the specified training data (just like one batch) and test data (specified),, by contrast, in the form of inductive GCN, we need multiple batches (adj and the features of data) to fit out the weights Is this understood correctly ?

yao8839836 commented 5 years ago

@yeshenpy

From my understanding, 'trandsductive' means training data are included in training process, but can also use mini-batch training (multiple batches).

There is a mini-batch version of transdutive GCN in FastGCN repository:

https://github.com/matenure/FastGCN/blob/master/pubmed-original_transductive_FastGCN.py

yeshenpy commented 5 years ago

Sorry to bother you again, but I found a problem when I was training., if the feature is not an identity matrix, but adds the feature of words. During training, loss of datas will not converge. Because it runs on a personal computer, the fastGCN architecture is used. At the beginning, word_vector_map was randomly initialized. I found that train_loss 、train_acc、test_acc . .did not decline steadily during training, but fluctuated violently.

I also found that after a round of training performing best on the test set.Is is because of random initialization of sentence features? any advice?

yeshenpy commented 5 years ago

The reason for this , okay, is it the adj not match the feature, right? Or something else?

yao8839836 commented 5 years ago

@yeshenpy

Random initialization of sentence features may be a problem, you can try averaging word features as sentence features. Maybe learning rate is too large?

yeshenpy commented 5 years ago

Ha, thank you very much for your reply, If I want to use inductive fastGCN ,and use textgcn code to build the adj and feature , should I change adj_train = adj[train_index, :][:, train_index] to adj_train = adj[train_index, :] [:,train_index+vocab_index] firstly? because I found if don't add vocab_index , the adj_train will be zero matrix. https://github.com/matenure/FastGCN/blob/master/pubmed_inductive_appr2layers.py

yao8839836 commented 5 years ago

@yeshenpy

Hi, I used the same code as in pubmed_inductive_appr2layers.py No need to use vocab_index, because they are not train, val or test nodes, they are after train and val index, and are before test index.


def main(rank1, rank0): adj, features, y_train, y_val, y_test, train_mask, val_mask, testmask, , _ = load_corpus(FLAGS.dataset) train_index = np.where(train_mask)[0] adj_train = adj[train_index, :][:, train_index]


def load_corpus(dataset_str):

names = ['x', 'y', 'tx', 'ty', 'allx', 'ally', 'adj']
objects = []
for i in range(len(names)):
    with open("data/ind.{}.{}".format(dataset_str, names[i]), 'rb') as f:
        if sys.version_info > (3, 0):
            objects.append(pkl.load(f, encoding='latin1'))
        else:
            objects.append(pkl.load(f))

x, y, tx, ty, allx, ally, adj = tuple(objects)
print(x.shape, y.shape, tx.shape, ty.shape, allx.shape, ally.shape)

features = sp.vstack((allx, tx)).tolil()
features = sp.identity(features.shape[0])
labels = np.vstack((ally, ty))
print(len(labels))

train_idx_orig = parse_index_file(
    "data/{}.train.index".format(dataset_str))
train_size = len(train_idx_orig)

val_size = train_size - x.shape[0]
test_size = tx.shape[0]

idx_train = range(len(y))
idx_val = range(len(y), len(y) + val_size)
idx_test = range(allx.shape[0], allx.shape[0] + test_size)

train_mask = sample_mask(idx_train, labels.shape[0])
val_mask = sample_mask(idx_val, labels.shape[0])
test_mask = sample_mask(idx_test, labels.shape[0])

y_train = np.zeros(labels.shape)
y_val = np.zeros(labels.shape)
y_test = np.zeros(labels.shape)
y_train[train_mask, :] = labels[train_mask, :]
y_val[val_mask, :] = labels[val_mask, :]
y_test[test_mask, :] = labels[test_mask, :]

adj = adj + adj.T.multiply(adj.T > adj) - adj.multiply(adj.T > adj)

return adj, features, y_train, y_val, y_test, train_mask, val_mask, test_mask, train_size, test_size

yeshenpy commented 5 years ago

This means that I should use the load_corpus() function in text_gcn to replace the load_data and load_original_data() function in fastGCN ?

yao8839836 commented 5 years ago

@yeshenpy

No need to replace, just add load_corpus() to utils.py in fastGCN code, and use it in pubmed_inductive_appr2layers.py as above.

yeshenpy commented 5 years ago

I am very sorry that I may not have clearly described my problem. Now I will elaborate on my problem:

(1) first of all, when we design adj., adj. shape should be [real_train_size + val_size + vocab_size + test_size,real_train_size + val_size + vocab_size + test_size],is it right? The content of adj is mainly word-word realtion and word-doc realtion, so if we are dealing with adj, we use train_adj =adj[real_train_size,:][:,real_train_size], which means we will get an empty matrix, because all relation are related to vocab, but we filter vocab directly. At the same time, we also used the identity matrix features. , an empty matrix adj.which becomes an identity matrix after a series of processing, so the relation of doc word we calculated is not used at all.

(2) because our adj. structure is [real_train_size + val_size + vocab_size + test_size,real_train_size + val_size + vocab_size + test_size], when we use test_index, should we add vocab_size as the last correct index?

yao8839836 commented 5 years ago

@yeshenpy

I think you are right, it appears I have made a mistake on this. I am sorry, and I will correct my results.

yeshenpy commented 5 years ago

However, I got a good accuracy rate in my test on this condition whcih maybe don't use the relation and the test index may be not correct, which made me very confused.

yao8839836 commented 5 years ago

@yeshenpy

Are you using normADJ[test_index, :][, :] as test graph or normADJ[test_index, :][, test_index + vocab_index]?

yeshenpy commented 5 years ago

I use my own data , and use adj_train = adj[train_index, :][:, train_index], normADJ_test = nontuple_preprocess_adj(adj[train_test_idnex,:][:,train_test_idnex]) testSupport = sparse_to_tuple(normADJ_test[len(train_index):, :]), I don't change the important code in the FastGCN

I run pubmed-original_inductive_FastGCN.py I got 90% acc but when I run pubmed_inductive_appr2layers.py ,The loss does not converge ,ac is nearly 60%

yeshenpy commented 5 years ago

I want to deal with the (2) question ,so I add vocab_size to reindex the correct test_index ,but fail to get a good ac, it's like to random choose the result .

yao8839836 commented 5 years ago

@yeshenpy

Hi, I change the main function in pubmed_inductive_appr2layers.py as follows and can achieve reasonable results (0.8319 for 20ng with learning rate 0.001 and rank0=rank1=600), the doc-word relations are preserved in adj_train, and the test support is testSupport = [sparse_to_tuple(normADJ), sparse_to_tuple(normADJ[test_index, :])] which contains edges among test doc nodes and all other nodes.

I made the code available at https://github.com/yao8839836/fast_text_gcn

I think adding vocab_size to reindex the correct test_index may not work, because word nodes don't have labels. Note that I rewrote nontuple_preprocess_adj in utils.py so that it can process non-square matrix.


Main function:

def main(rank1, rank0):

#adj, features, y_train, y_val, y_test, train_mask, val_mask, test_mask = load_data(FLAGS.dataset)
adj, features, y_train, y_val, y_test, y_vocab, train_mask, val_mask, test_mask, vocab_mask, _, _ = load_corpus(FLAGS.dataset)
train_index = np.where(train_mask)[0]
vocab_index = np.where(vocab_mask)[0]
tmp_index = list(train_index) + list(vocab_index)
adj_train = adj[train_index, :][:, tmp_index]
adj_train_vocab = adj[tmp_index, :][:, tmp_index]
print(len(train_mask))
train_mask = train_mask[train_index]
y_train = y_train[train_index]
val_index = np.where(val_mask)[0]
# adj_val = adj[val_index, :][:, val_index]
val_mask = val_mask[val_index]
y_val = y_val[val_index]
test_index = np.where(test_mask)[0]
# adj_test = adj[test_index, :][:, test_index]
test_mask = test_mask[test_index]
y_test = y_test[test_index]
numNode_train_1 = adj_train.shape[1]
numNode_train_0 = adj_train.shape[0]
# print("numNode", numNode)

# Some preprocessing
features = nontuple_preprocess_features(features).todense()
train_features = features[tmp_index]

if FLAGS.model == 'gcn_appr':
    normADJ_train = nontuple_preprocess_adj(adj_train)
    normADJ_train_vocab = nontuple_preprocess_adj(adj_train_vocab)
    print(normADJ_train)
    normADJ = nontuple_preprocess_adj(adj)
    # normADJ_val = nontuple_preprocess_adj(adj_val)
    # normADJ_test = nontuple_preprocess_adj(adj_test)

    num_supports = 2
    model_func = GCN_APPRO
else:
    raise ValueError('Invalid argument for model: ' + str(FLAGS.model))

# Define placeholders
placeholders = {
    'support': [tf.sparse_placeholder(tf.float32) for _ in range(num_supports)],
    'features': tf.placeholder(tf.float32, shape=(None, features.shape[1])),
    'labels': tf.placeholder(tf.float32, shape=(None, y_train.shape[1])),
    'labels_mask': tf.placeholder(tf.int32),
    'dropout': tf.placeholder_with_default(0., shape=()),
    'num_features_nonzero': tf.placeholder(tf.int32)  # helper variable for sparse dropout
}

# Create model
model = model_func(placeholders, input_dim=features.shape[-1], logging=True)

# Initialize session
sess = tf.Session()

# Define model evaluation function
def evaluate(features, support, labels, mask, placeholders):
    t_test = time.time()
    feed_dict_val = construct_feed_dict(features, support, labels, mask, placeholders)
    outs_val = sess.run([model.loss, model.accuracy], feed_dict=feed_dict_val)
    return outs_val[0], outs_val[1], (time.time() - t_test)

# Init variables
sess.run(tf.global_variables_initializer())

cost_val = []

p0 = column_prop(normADJ_train)

# testSupport = [sparse_to_tuple(normADJ), sparse_to_tuple(normADJ)]
valSupport = [sparse_to_tuple(normADJ), sparse_to_tuple(normADJ[val_index, :])]
testSupport = [sparse_to_tuple(normADJ), sparse_to_tuple(normADJ[test_index, :])]

t = time.time()
# Train model
for epoch in range(FLAGS.epochs):
    t1 = time.time()

    n = 0
    for batch in iterate_minibatches_listinputs([normADJ_train, y_train, train_mask], batchsize=256, shuffle=True):
        [normADJ_batch, y_train_batch, train_mask_batch] = batch
        if sum(train_mask_batch) < 1:
            continue
        #print(normADJ_batch)
        p1 = column_prop(normADJ_batch)
        #print(p1.shape)
        q1 = np.random.choice(np.arange(numNode_train_1), rank1, p=p1)  # top layer
        # q0 = np.random.choice(np.arange(numNode_train), rank0, p=p0)  # bottom layer
        support1 = sparse_to_tuple(normADJ_batch[:, q1].dot(sp.diags(1.0 / (p1[q1] * rank1))))
        #print(q1)
        p2 = column_prop(normADJ_train_vocab[q1, :])
        #print(p2.shape)
        q0 = np.random.choice(np.arange(numNode_train_1), rank0, p=p2)
        support0 = sparse_to_tuple(normADJ_train_vocab[q1, :][:, q0])
        #print(y_train_batch, train_mask_batch, len(train_mask))
        features_inputs = sp.diags(1.0 / (p2[q0] * rank0)).dot(train_features[q0, :])  # selected nodes for approximation

        # Construct feed dictionary
        feed_dict = construct_feed_dict(features_inputs, [support0, support1], y_train_batch, train_mask_batch,
                                        placeholders)
        feed_dict.update({placeholders['dropout']: FLAGS.dropout})

        # Training step
        outs = sess.run([model.opt_op, model.loss, model.accuracy], feed_dict=feed_dict)

    # Validation
    cost, acc, duration = evaluate(features, valSupport, y_val, val_mask, placeholders)
    cost_val.append(cost)

    # # Print results
    print("Epoch:", '%04d' % (epoch + 1), "train_loss=", "{:.5f}".format(outs[1]),
          "train_acc=", "{:.5f}".format(outs[2]), "val_loss=", "{:.5f}".format(cost),
          "val_acc=", "{:.5f}".format(acc), "time=", "{:.5f}".format(time.time() - t1))

    if epoch > FLAGS.early_stopping and cost_val[-1] > np.mean(cost_val[-(FLAGS.early_stopping + 1):-1]):
        # print("Early stopping...")
        break

train_duration = time.time() - t
# Testing
test_cost, test_acc, test_duration = evaluate(features, testSupport, y_test, test_mask,
                                              placeholders)
print("rank1 = {}".format(rank1), "rank0 = {}".format(rank0), "cost=", "{:.5f}".format(test_cost),
      "accuracy=", "{:.5f}".format(test_acc), "training time per epoch=", "{:.5f}".format(train_duration/epoch))

nontuple_preprocess_adj in utils.py:

def nontuple_preprocess_adj(adj):

if adj.shape[0] == adj.shape[1]:
    adj_normalized = normalize_adj(sp.eye(adj.shape[0]) + adj)
else:
    rowsum = np.array(adj.sum(1))
    rowdegree_inv = np.power(rowsum, -0.5).flatten()
    rowdegree_inv[np.isinf(rowdegree_inv)] = 0.
    rowdegree_mat_inv = sp.diags(rowdegree_inv)

    colsum = np.array(adj.sum(0))
    coldegree_inv = np.power(colsum, -0.5).flatten()
    coldegree_inv[np.isinf(coldegree_inv)] = 0.
    coldegree_mat_inv = sp.diags(coldegree_inv)
    adj_normalized = rowdegree_mat_inv.dot(adj).dot(coldegree_mat_inv).tocoo()        
return adj_normalized.tocsr()
yeshenpy commented 5 years ago

hi, I am very sorry for my late reply。The experiment I did used the following code:I may have a problem understanding the nontuple_preprocess_adj function, I did‘nt change the nontuple_preprocess_adj function.is this correct?

first:

train_index = np.where(train_mask)[0] train_vocab_index = np.hstack((train_index, vocab_index)) adj_train = adj[train_vocab_index, :][:, train_vocab_index]

second:

if FLAGS.model == 'gcn_appr': normADJ_train = nontuple_preprocess_adj(adj_train) sample_normADJ_train = normADJ_train[:len(train_index)] normADJ = nontuple_preprocess_adj(adj) num_supports = 2 model_func = GCN_APPRO

third:

for batch in iterate_minibatches_listinputs([sample_normADJ_train, y_train, train_mask], batchsize=256, shuffle=True):

yao8839836 commented 5 years ago

@yeshenpy

No problem, I am also late :) I think your code is correct, are the results reasonable?

yeshenpy commented 5 years ago

Hi My experimental results were also satisfactory, but I have a question. I have read a lot of papers on GCN. If I want to embed GCN into an end-to-end network, can I redefine a new graph every time? It's not a static graph but a dynamic graph. For example, for an image, I selected 10 targets as nodes to construct the graph, assuming that the relationship is the location relationship of the 10 targets. However, for different graphs, the relationship of the 10 nodes is constantly changing, which leads to the need to constantly reconstruct the graph. Is that feasible? Is there any similar article or code?

best pengyi

Dr. Liang Yao notifications@github.com 于2019年3月12日周二 下午12:21写道:

@yeshenpy https://github.com/yeshenpy No problem, I am also late :) I think your code is correct, are the results reasonable?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/yao8839836/text_gcn/issues/19#issuecomment-471851698, or mute the thread https://github.com/notifications/unsubscribe-auth/AppVda4j88y5-Xw-UsJSZPHU-82ONtGDks5vVytAgaJpZM4awdXY .

yao8839836 commented 5 years ago

@yeshenpy

Hi, I think it' feasible to construct a mini-batch of test graph with limited number of nodes.

Here are some similar paper or code:

Huang, W., Zhang, T., Rong, Y. and Huang, J., 2018. Adaptive sampling towards fast graph representation learning. In Advances in Neural Information Processing Systems (pp. 4563-4572).

https://github.com/alibaba/euler/wiki/ScalableGCN

Best wishes!

yeshenpy commented 5 years ago

Hi I have read the paper you provided,But I don't think that's the core of my concern.It is very similar to FastGCN.I hope we can discuss the related knowledge again。 Below I provide an article where I have a lot of doubts after reading it。 “Exploring visual relationship for image captioning http://openaccess.thecvf.com/content_ECCV_2018/html/Ting_Yao_Exploring_Visual_Relationship_ECCV_2018_paper.html

I think you'll have the same question as I did after reading it. can we keep redefining the graph network in the process of training?Is that a problem? Hope to get your opinion.

Best pengyi

Dr. Liang Yao notifications@github.com 于2019年3月13日周三 下午4:11写道:

@yeshenpy https://github.com/yeshenpy

Hi, I think it' feasible to construct a mini-batch of test graph with limited number of nodes.

Here are some similar paper or code:

Huang, W., Zhang, T., Rong, Y. and Huang, J., 2018. Adaptive sampling towards fast graph representation learning. In Advances in Neural Information Processing Systems (pp. 4563-4572).

https://github.com/alibaba/euler/wiki/ScalableGCN

Best wishes!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/yao8839836/text_gcn/issues/19#issuecomment-472321230, or mute the thread https://github.com/notifications/unsubscribe-auth/AppVdXtaU1G1Ts8Xde8n7dqAs-MAKWHqks5vWLLHgaJpZM4awdXY .

yao8839836 commented 5 years ago

@yeshenpy

Hi, I think this kind of work use the same GCN layer but different graphs (one for each example) as input.

For example, in https://github.com/malllabiisc/NeuralDater , every sentence has a syntactic graph as the input (defined as a placeholder in TensorFlow). But the GCN layer is the same one, different graphs will have different GCN layer weight W_i. Please see neural_dater.py in the project.

yeshenpy commented 5 years ago

Hi ,All of a sudden, there were some other things that needed to be dealt with, so I didn't reply for a long time. I'm really sorry for that. But when I came back to test the code, I had a big problem., I tried your method but the results were not good.I hope to get your help.thanks a lot the main code : `def main(rank1, rank0): adj, features, y_train, y_val, y_test, train_mask, val_mask, test_mask ,vocab_index= load_data_original(FLAGS.dataset)

train_index = np.where(train_mask)[0]
train_vocab_index = np.hstack((train_index, vocab_index))

adj_train = adj[train_index, :][:, train_vocab_index]
adj_train_vocab = adj[train_vocab_index, :][:, train_vocab_index]

train_mask = train_mask[train_index]
y_train = y_train[train_index]

test_index = np.where(test_mask)[0]
test_mask = test_mask[test_index]
y_test = y_test[test_index]

numNode_train_1 = adj_train.shape[1]
numNode_train_0 = adj_train.shape[0]

# Some preprocessing
features = sp.identity(features.shape[0])
features = nontuple_preprocess_features(features).todense()
train_features = features[train_vocab_index]

if FLAGS.model == 'gcn_appr':
    normADJ_train = nontuple_preprocess_adj(adj_train)
    normADJ_train_vocab = nontuple_preprocess_adj(adj_train_vocab)

    #TODO ????
    normADJ = nontuple_preprocess_adj(adj)
    # normADJ_val = nontuple_preprocess_adj(adj_val)
    # normADJ_test = nontuple_preprocess_adj(adj_test)

    num_supports = 2
    model_func = GCN_APPRO
else:
    raise ValueError('Invalid argument for model: ' + str(FLAGS.model))

# Define placeholders
placeholders = {
    'support': [tf.sparse_placeholder(tf.float32) for _ in range(num_supports)],
    'features': tf.placeholder(tf.float32, shape=(None, features.shape[1])),
    'labels': tf.placeholder(tf.float32, shape=(None, y_train.shape[1])),
    'labels_mask': tf.placeholder(tf.int32),
    'dropout': tf.placeholder_with_default(0., shape=()),
    'num_features_nonzero': tf.placeholder(tf.int32)  # helper variable for sparse dropout
}

# Create model
model = model_func(placeholders, input_dim=features.shape[-1], logging=True)

os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ['CUDA_VISIBLE_DEVICES'] = "1"
session_conf = tf.ConfigProto(
    log_device_placement=True,
    allow_soft_placement=True)
session_conf.gpu_options.allow_growth = True
# Initialize session
sess = tf.Session(config=session_conf)

# Define model evaluation function
def evaluate(features, support, labels, mask, placeholders):
    t_test = time.time()
    feed_dict_val = construct_feed_dict(features, support, labels, mask, placeholders)
    outs_val = sess.run([model.loss, model.accuracy], feed_dict=feed_dict_val)
    return outs_val[0], outs_val[1], (time.time() - t_test)

# Init variables
sess.run(tf.global_variables_initializer())

cost_val = []

p0 = column_prop(normADJ_train)

# testSupport = [sparse_to_tuple(normADJ), sparse_to_tuple(normADJ)]
#valSupport = [sparse_to_tuple(normADJ), sparse_to_tuple(normADJ[val_index, :])]
testSupport = [sparse_to_tuple(normADJ), sparse_to_tuple(normADJ[test_index, :])]

t = time.time()
# Train model
for epoch in range(FLAGS.epochs):
    t1 = time.time()

    n = 0
    mean_loss = []
    mean_ac = []
    for batch in iterate_minibatches_listinputs([normADJ_train, y_train, train_mask], batchsize=256, shuffle=True):
        [normADJ_batch, y_train_batch, train_mask_batch] = batch
        if sum(train_mask_batch) < 1:
            continue
        p1 = column_prop(normADJ_batch)
        q1 = np.random.choice(np.arange(numNode_train_1), rank1, p=p1)  # top layer

        support1 = sparse_to_tuple(normADJ_batch[:, q1].dot(sp.diags(1.0 / (p1[q1] * rank1))))

        p2 = column_prop(normADJ_train_vocab[q1, :])
        q0 = np.random.choice(np.arange(numNode_train_1), rank0, p=p2)
        support0 = sparse_to_tuple(normADJ_train_vocab[q1, :][:, q0])
        features_inputs = sp.diags(1.0 / (p2[q0] * rank0)).dot(train_features[q0, :])  # selected nodes for approximation

        # Construct feed dictionary
        feed_dict = construct_feed_dict(features_inputs, [support0, support1], y_train_batch, train_mask_batch,
                                        placeholders)
        feed_dict.update({placeholders['dropout']: FLAGS.dropout})

        # Training step
        outs = sess.run([model.opt_op, model.loss, model.accuracy], feed_dict=feed_dict)
        mean_loss.append(outs[1])
        mean_ac.append(outs[2])
    print("train process : mean_loss:", "{:.8f}".format(np.mean(mean_loss)),
                        "  mean_accuarcy:", "{:.8f}".format(np.mean(mean_ac)))
    # # Validation
    # cost, acc, duration = evaluate(features, valSupport, y_val, val_mask, placeholders)
    # cost_val.append(cost)
    #
    # # # Print results
    # print("Epoch:", '%04d' % (epoch + 1),"val_loss=", "{:.5f}".format(cost),
    #       "val_acc=", "{:.5f}".format(acc), "time=", "{:.5f}".format(time.time() - t1))

    # if epoch > FLAGS.early_stopping and cost_val[-1] > np.mean(cost_val[-(FLAGS.early_stopping + 1):-1]):
    #     # print("Early stopping...")
    #     break
    if epoch%5 == 0 :
        # Testing
        test_cost, test_acc, test_duration = evaluate(features, testSupport, y_test, test_mask,
                                                      placeholders)
        print("rank1 = {}".format(rank1), "rank0 = {}".format(rank0), "cost=", "{:.5f}".format(test_cost),
              "accuracy=", "{:.5f}".format(test_acc))

`

just the same with your code , the main setting

`if name=="main":

print("DATASET:", FLAGS.dataset)

k=50

main(k, k)`

`flags = tf.app.flags

FLAGS = flags.FLAGS

flags.DEFINE_string('dataset', 'R8', 'Dataset string.') # 'cora', 'citeseer', 'pubmed' flags.DEFINE_string('model', 'gcn_appr', 'Model string.') # 'gcn', 'gcn_appr' flags.DEFINE_float('learning_rate', 0.01, 'Initial learning rate.') flags.DEFINE_integer('epochs', 200, 'Number of epochs to train.') flags.DEFINE_integer('hidden1', 16, 'Number of units in hidden layer 1.') flags.DEFINE_float('dropout', 0.0, 'Dropout rate (1 - keep probability).') flags.DEFINE_float('weight_decay', 5e-4, 'Weight for L2 loss on embedding matrix.') flags.DEFINE_integer('early_stopping', 30, 'Tolerance for early stopping (# of epochs).') flags.DEFINE_integer('max_degree', 3, 'Maximum Chebyshev polynomial degree.')`

The results I get are very strange and not nearly as good as, say, CNNText

the result is

rank1 = 50 rank0 = 50 cost= 2.10857 accuracy= 0.73687 train process : mean_loss: 2.18299270 mean_accuarcy: 0.13342927 train process : mean_loss: 2.08987427 mean_accuarcy: 0.19469573 train process : mean_loss: 2.16004872 mean_accuarcy: 0.14350329 train process : mean_loss: 2.13611579 mean_accuarcy: 0.15912829 train process : mean_loss: 2.11484408 mean_accuarcy: 0.17208059 rank1 = 50 rank0 = 50 cost= 2.11105 accuracy= 0.72956 train process : mean_loss: 2.17817831 mean_accuarcy: 0.13877468 train process : mean_loss: 2.15126252 mean_accuarcy: 0.17495888 train process : mean_loss: 2.12972283 mean_accuarcy: 0.16159539 train process : mean_loss: 2.17729020 mean_accuarcy: 0.14309211 train process : mean_loss: 2.15094995 mean_accuarcy: 0.15337171 rank1 = 50 rank0 = 50 cost= 2.11457 accuracy= 0.70443 train process : mean_loss: 2.15329981 mean_accuarcy: 0.14555921 train process : mean_loss: 2.16662288 mean_accuarcy: 0.13301809 train process : mean_loss: 2.09061027 mean_accuarcy: 0.17783718 train process : mean_loss: 2.14036870 mean_accuarcy: 0.14185855 train process : mean_loss: 2.09271693 mean_accuarcy: 0.17763157 rank1 = 50 rank0 = 50 cost= 2.09957 accuracy= 0.68296 train process : mean_loss: 2.16462994 mean_accuarcy: 0.13856907 train process : mean_loss: 2.11122751 mean_accuarcy: 0.16241777 train process : mean_loss: 2.11914682 mean_accuarcy: 0.17228618 train process : mean_loss: 2.09339857 mean_accuarcy: 0.17598684 train process : mean_loss: 2.12867928 mean_accuarcy: 0.16776316 rank1 = 50 rank0 = 50 cost= 2.09475 accuracy= 0.70809 train process : mean_loss: 2.13009644 mean_accuarcy: 0.16036184 train process : mean_loss: 2.12422848 mean_accuarcy: 0.17824836 train process : mean_loss: 2.09685731 mean_accuarcy: 0.18503289 train process : mean_loss: 2.22818708 mean_accuarcy: 0.10793585 train process : mean_loss: 2.17838335 mean_accuarcy: 0.14555921 rank1 = 50 rank0 = 50 cost= 2.10586 accuracy= 0.70169 train process : mean_loss: 2.15089488 mean_accuarcy: 0.17002468 train process : mean_loss: 2.21109462 mean_accuarcy: 0.12643914 train process : mean_loss: 2.08513427 mean_accuarcy: 0.22224507 train process : mean_loss: 2.19628286 mean_accuarcy: 0.12150493

The loss of training will not converge at all. In addition, build_graph.py was used to build the R8 data set The code for my load_data_original method is as follows: `def load_data_original(dataset_str): """Load data.""" names = ['x', 'y', 'tx', 'ty', 'allx', 'ally', 'adj'] objects = [] for i in range(len(names)): with open("data/ind.{}.{}".format(dataset_str, names[i]), 'rb') as f: if sys.version_info > (3, 0): objects.append(pkl.load(f, encoding='latin1')) else: objects.append(pkl.load(f))

x, y, tx, ty, allx, ally, graph = tuple(objects)

train_idx_orig = parse_index_file("data/{}.train.index".format(dataset_str))
train_size = len(train_idx_orig)
val_size = train_size - x.shape[0]

vocab_index = np.array(range(len(y)+val_size,len(ally)))
features = sp.vstack((allx, tx)).tolil()

adj = graph
labels = np.vstack((ally, ty))

idx_test = range(allx.shape[0], allx.shape[0] + tx.shape[0])

idx_train = range(len(y))
idx_val = range(len(y), len(y)+val_size)

train_mask = sample_mask(idx_train, labels.shape[0])
val_mask = sample_mask(idx_val, labels.shape[0])
test_mask = sample_mask(idx_test, labels.shape[0])

y_train = np.zeros(labels.shape)
y_val = np.zeros(labels.shape)
y_test = np.zeros(labels.shape)
y_train[train_mask, :] = labels[train_mask, :]
y_val[val_mask, :] = labels[val_mask, :]
y_test[test_mask, :] = labels[test_mask, :]

adj = adj + adj.T.multiply(adj.T > adj) - adj.multiply(adj.T > adj)

return adj, features, y_train, y_val, y_test, train_mask, val_mask, test_mask,vocab_index

`

yeshenpy commented 5 years ago

I found a mistake , that is the batch_size is too big ,so I change it to 20 ,and the result is better. as follow:

train process : mean_loss: 1.47825873 mean_accuarcy: 0.61624086 train process : mean_loss: 1.43658328 mean_accuarcy: 0.62755477 train process : mean_loss: 1.42947161 mean_accuarcy: 0.63485402 train process : mean_loss: 1.44350576 mean_accuarcy: 0.62901455 train process : mean_loss: 1.42589819 mean_accuarcy: 0.64014596 rank1 = 100 rank0 = 100 cost= 1.33792 accuracy= 0.91366

and the follow is CNN+one_layer GCN:

train process : mean_loss: 0.57466590 mean_accuarcy: 0.90656930 train process : mean_loss: 0.59261906 mean_accuarcy: 0.90419710 train process : mean_loss: 0.58938766 mean_accuarcy: 0.90711677 train process : mean_loss: 0.58605176 mean_accuarcy: 0.90565693

test process : mean_loss: 2.21752882 mean_accuarcy: 0.93101871 train process : mean_loss: 0.56950825 mean_accuarcy: 0.91131389 training time by far= 192.02658 epoch = 11 cost= 2.23968291 accuracy= 0.94655097

two-layers GCN performance is lower than the combination of CNN and GCN ,and perform worse in the train data (pumbed_original_inductive_fastgcn.py). so I have two question : 1: why is the two-layer GCN method worse than the combined CNN and GCN method? and It takes longer to converge.two-layers GCN performed better in the paper

  1. Why is the accuracy of the training set always lower than that of the test set in two-layer GCN? The general case should be that the accuracy of the test set is lower than the train set
yao8839836 commented 5 years ago

@yeshenpy

Hi, thanks for trying this. Maybe you can try higher embedding dimension larger than 16. Inductive version should perform worse than transductive version. I also found two layer GCN + CNN performs worse than one layer GCN + CNN. It seem 1st order neighborhood information is enough for the inductive + CNN setting, CNN also captures local syntactic and semantic information. It seems the "mean accuracy" in your results is not the final accuracy on your training set.