thunlp / OpenNE

An Open-Source Package for Network Embedding (NE)
MIT License
1.68k stars 485 forks source link

Results w.r.t to the testing. #80

Closed d12306 closed 4 years ago

d12306 commented 5 years ago

Hi, @zzy14 , thanks for your implementation first. But with regard to your testing method, you actually give out the number of classes as a prior knowledge in the TopkRankClassifier, this is not strict theoretically. In fact, I test the performance using the following code and it works not perfect.

for i in range(epoch):
                self.model1.train_one_epoch()
                self.model2.train_one_epoch()
                if i % 10 == 0:
                    self.get_embeddings()
                    new_embedding = []
                    keys = list(set(self.vectors.keys()))
                    for i in keys:
                        new_embedding.append(self.vectors[i])
                    #testing.
                    #labels are in the form of multi-hot vector for multi-label classification.
                    X_train,X_test, y_train, y_test = train_test_split(new_embedding, labels, test_size=0.5, random_state=123)
                    clf = OneVsRestClassifier(LogisticRegression(), n_jobs = 2)
                    clf.fit(X_train, y_train)
                    test_labels_pred = clf.predict(X_test)
                    test_scores_pred = clf.predict_proba(X_test)
                    #from sklearn.metrics import classification_report
                    #acc = classification_report(y_test, test_labels_pred)
                    #print(acc)
                    acc = sklearn.metrics.accuracy_score(y_test, test_labels_pred)
                    auc = sklearn.metrics.roc_auc_score(y_test, test_scores_pred)
                    precision = sklearn.metrics.precision_score(y_test, test_labels_pred, average = 'macro')
                    recall = sklearn.metrics.recall_score(y_test, test_labels_pred, average = 'macro')
                    f1 = sklearn.metrics.f1_score(y_test, test_labels_pred, average = 'macro')
                    print(f"Average accuracy: {acc}")
                    print(f"Average auc: {auc}")
                    print(f"macro precision: {precision}")
                    print(f"macro recall: {recall}")
                    print(f"macro F1: {f1}")

Could you please help me with this issue? Thanks.

zzy14 commented 5 years ago

Could you please describe your input data?

d12306 commented 5 years ago

@zzy14 ,Hi, the input is the embedding of all the nodes in order. and the labels are a vector for every node describing the classes the node belong to such as [1,0,0,1,..,1]. Thanks,

zzy14 commented 5 years ago

According to your code, I think the evaluation process is correct.

d12306 commented 5 years ago

@zzy14 ,thanks, it tends out that I do not make the embeddings in order properly. And the testing performance is great though.