Would you mind add GCN model in baselines?

gegemy commented 5 years ago

In the KDD19 paper, besides Deepwalk and AROPE, you also use GCN model to prove the performance of AutoNE framework, but your source code do not contain this part? Would you mind add this?

tadpole commented 5 years ago

You can put the code of https://github.com/tkipf/gcn into embedding_test/src/baseline/ and just run it.

Please make sure that the GCN had been installed on that directory. Note that the GCN needs features as input, so you can only set the dataset as pubmed and task as classification in the Makefile.

Besides, you can see whether the GCN works well by setting the variable debug as True in src/main.py.

gegemy commented 5 years ago

Thanks a lot. Besides that, you said tuning the size of hidden layer, this parameter is 'flags.DEFINE_integer('hidden1', 16, 'Number of units in hidden layer 1.')'? Or you tuning the number of hidden layers?

tadpole commented 5 years ago

Yes，it means the number of units in the hidden layer 1.

gegemy commented 5 years ago

OK，thanks a lot. I'll try, please keep in touch!

gegemy commented 5 years ago

I find the GCN source code provided dataset format is different from the pubmed dataset your provide and has it load data function, is that we use the GCN provided pubmed dataset to train GCN model, and use AutoNE provided dataset to train classification? We don't need to rewrite the GCN load data model? Are the Pubmed dataset you provided is same as GCN provided?

tadpole commented 5 years ago

For easily processing, I change the format of the pubmed. You can see the convert_gcn_data function in the src/utils.py to get how to preprocess it.

Besides, you can find the download link of the preprocessed datasets in the readme.

gegemy commented 5 years ago

Did you change the GCN source code to save it embedding results? If not, did you directly use the classification result providing by GCN as the AutoNE final result? If that, does it need to change other part of AutoNE (Because it seems need to put embeddings as input)?

tadpole commented 5 years ago

Sorry, I have modified the GCN code to filtering the I/O. The gcn and AROPE codes are shown in here.

gegemy commented 5 years ago

Maybe I have fixed this problem, it need to run convert_gcn_data function at first and then running cmd needs to set hyper-parameter '--output_dir' at first to save gcn embeddings...Thanks a lot

gegemy commented 5 years ago

I tried to reproduce the results of the AutoNE paper, and unfortunately I did not get the same result.

I tried to reproduce the Deepwalk experiment results about BlogCatalog and Wikipedia datasets(using 3 hyper-parameters), but compared to AutoNE, it has different results. For example, BayesOpt results are worse than Random search but the paper says Random search is the worst.
I tried to adjust 4 hyper-parameters, and added NaiveAutoNE, it also has different results. It seems that NaiveAutoNE has better results than AutoNE, and Random search is still the best?

Would you mind help me to reproduce it? Maybe I missed some key points?

tadpole commented 5 years ago

It is very strange.

The first run of BayesOpt is just random, it should be similar to the random search. But in the first trial in your experiments, the random search is much better. The performance curve is not very stable since some hyperparameters are discrete and the performance function is non-smooth. I think you need to run many times and get the mean value.

In fact, I also find the BayesOpt maybe sometimes worse than Random in the beginning, for example, Figure 3 a, b. That is because the BayesOpt trends to explore the preset bound of the hyperparameters in the beginning few trials to get the shape of the curve.

As for 4 hyper-parameters, I think in your figure, the higher curve should be better, so AutoNE has a better result than NaiveAutoNE. The problem of random may be similar to the previous one and please run more times.

gegemy commented 5 years ago

I run these baselines 5 times with 3 tunable parameters, and to draw the mean values directly without process, but I also get the different result with the AutoNE paper.

This experiment shows all baselines performance values are unstable, and random search is even the best one ? But the performance lines in AutoNE are growing up stable, and random search is the worst? Is it because I have not run all baselines enough times?

tadpole commented 5 years ago

In fact, in hyperparameter optimization experiments, the AUC in trials T is the best AUC value we had found. So the curve of each method should be nondecreasing.

In my code, I do this in random_search function (main.py Line 274-277). But in Bayes_opt and AutoNE, I do not do that for debug. You should do that when you plot the figure.

As for that random may be better than Bayes_opt, I think it is OK. In my results, the Bayes_opt is also not always better than random.

tadpole / AutoNE

Would you mind add GCN model in baselines? #1