report the results on all datasets

dawnranger commented 5 years ago

Results of node2vec, deewalk, line, sdne and struc2vec on all datasets. Hope this will help anyone who is interested in this project.

wiki

Alg	micro	macro	samples	weighted	acc	NMI
node2vec	0.7447	0.6771	0.7193	0.7450	0.6279	0.3536
deepwalk	0.7307	0.6579	0.7058	0.7296	0.6091	0.3416
line	0.5059	0.2461	0.4536	0.4523	0.3160	0.0798
sdne	0.6916	0.5119	0.6528	0.6718	0.5530	0.1801
struc2vec	0.4512	0.1249	0.3933	0.3383	0.2308	0.0516

brazil

Alg	micro	macro	samples	weighted	acc	NMI
node2vec	0.1481	0.1579	0.1481	0.1648	0.1481	0.0442
deepwalk	0.1852	0.1694	0.1852	0.2004	0.1852	0.0471
line	0.4444	0.4167	0.4444	0.4753	0.4444	0.2822
sdne	0.5926	0.5814	0.5926	0.5928	0.5926	0.4041
struc2vec	0.7778	0.7739	0.7778	0.7762	0.7778	0.3906

europe

Alg	micro	macro	samples	weighted	acc	NMI
node2vec	0.4125	0.4156	0.4125	0.4209	0.4125	0.0155
deepwalk	0.4375	0.4358	0.4375	0.4347	0.4375	0.0180
line	0.5000	0.4983	0.5000	0.5016	0.5000	0.1186
sdne	0.5000	0.4818	0.5000	0.4916	0.5000	0.1714
struc2vec	0.5375	0.5247	0.5375	0.5294	0.5375	0.0783

usa

Alg	micro	macro	samples	weighted	acc	NMI
node2vec	0.5420	0.5278	0.5420	0.5351	0.5420	0.0822
deepwalk	0.5504	0.5394	0.5504	0.5472	0.5504	0.0910
line	0.4160	0.4032	0.4160	0.4175	0.4160	0.1660
sdne	0.6092	0.5819	0.6092	0.5971	0.6092	0.2028
struc2vec	0.5210	0.5040	0.5210	0.5211	0.5210	0.0702

Volcano-plus commented 5 years ago

For wiki given by author, it is single_label, so what I got is micro=sample=acc . Or do you have a more complete data for wiki?

dawnranger commented 5 years ago

For wiki given by author, it is single_label, so what I got is micro=sample=acc . Or do you have a more complete data for wiki?

here is the document of parameter average of sklean.metirc.f1_score:

average : string, [None, ‘binary’ (default), ‘micro’, ‘macro’, ‘samples’, ‘weighted’] This parameter is required for multiclass/multilabel targets.

'micro': Calculate metrics globally by counting the total true positives, false negatives and false positives.

'macro': Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.

'weighted': Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall.

'samples': Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score).

So, I think it will get different results in a multiclass case.

shenweichen commented 5 years ago

@dawnranger That's good,I think you can open a pull request about the results on datasets and the codes to reproduce the results in a new folder.

Volcano-plus commented 5 years ago

@dawnranger 'samples': Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score). Wiki is multiclass rather than multilabel, isn‘t it? Why there is a difference between sample and acc? In addition, for flight data in your result, micro=sample=acc.

dawnranger commented 5 years ago

@dawnranger 'samples': Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score). Wiki is multiclass rather than multilabel, isn‘t it? Why there is a difference between sample and acc? In addition, for flight data in your result, micro=sample=acc.

I think you are right. I use shenweichen's code :

averages = ["micro", "macro", "samples", "weighted"]
results = {}
for average in averages:
    results[average] = f1_score(Y, Y_, average=average)
results['acc'] = accuracy_score(Y,Y_)

and I got a warning with wiki dataset:

python3/lib/python3.6/site-packages/sklearn/metrics/classification.py:1135: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.

As discussed in stackoverflow, the ill spliting of the train/test set might be blamed for this issue.

Volcano-plus commented 5 years ago

@dawnranger Yes. I found that the classify.py is similar to scoring.py in deepwalk which is provided by writer https://github.com/phanein/deepwalk/blob/master/example_graphs/scoring.py what I was confused is author did not provide the result and the origin of wiki. In addition, I tried data BlogCatalog(multi-lable) as the node2vec paper mentioned, and I set parameter as the paper did(d=128, r=10, l=80, k=10. training percent=50%, p=q=0.25), but I got a 0.12(MacroF1), far from the result which author provided(0.2581). So depressed...

960924 commented 5 years ago

hello, from these results, the accuracy does not seem to be high, what is the cause, is it a data problem?

shenweichen / GraphEmbedding

report the results on all datasets #11