Use DGCNN to run python array data

muhanzhang / DGCNN

Code for "M. Zhang, Z. Cui, M. Neumann, and Y. Chen, An End-to-End Deep Learning Architecture for Graph Classification, AAAI-18".

MIT License

174 stars 44 forks source link

Use DGCNN to run python array data #2

Closed alexandrazxf closed 5 years ago

alexandrazxf commented 6 years ago

I tried to run DGCNN on my own dataset, which is in the forms of python array,and I don't know how to load them in Torch.Any suggestions?

muhanzhang commented 6 years ago

Hello. I have found a package which loads numpy arrays into Torch: https://github.com/htwaijry/npy4th

After installing it, you may need to write a script in Torch to transform your numpy arrays into Torch tensors in the form of: dataset = {instance={i: {1: A_i, 2: x_i}}, label={i: label_i}}, and save the transformed dataset to the folder "data/".

You can check the required Torch dataset format in "th" by running: dataset=torch.load("data/MUTAG.dat")

Let me know if it solves your problem.

alexandrazxf commented 6 years ago

Thank you for your advice.

I manage to store my python arrays in txt files ,and load them in Torch,it is quite slow but work fine.Your way seems better.

There's one little incidence I encounter in the process, my tensors have to be transform to FloatTensor for main.lua to work, any other type the "resizeAs" in line 431 would return error.

alexandrazxf commented 6 years ago

Also, after all the epochs, there's an error about accuracy log:

Performance on test set after all epochs: /root/torch/install/bin/luajit: main.lua:895: attempt to concatenate global 'testAcc' (a nil value) stack traceback: main.lua:895: in main chunk [C]: in function 'dofile' /root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00405d50

muhanzhang commented 6 years ago

Hi, for your first problem, I believe the argument of "resizeAs" should have the same type. For example, if you want "a = a:resizeAs(b)", then you should first ensure "b" has the same type as "a", by "b = b:type('torch. FloatTensor')" (suppose "a" is a FloatTensor). Also, in Torch, most numerical operations only support float and double tensors.

For your second problem, I am not sure what the error is. Can you print intermediate testAcc values before line 895 to see where it starts to be nil?

alexandrazxf commented 6 years ago

Thanks for your explanation for my first problem.

The second problem resolve perfectly when I update your code to the latests version, the error occur in the code I downloaded back in January.

How do I run 10-fold cross-validation using your code? Do I need to split my data in 10 parts and feed them to dgcnn respectively, or does the code split it for me?

muhanzhang commented 6 years ago

You are welcome. To run 10-fold cv, you may just use the "run_all.sh" script. For example, type "./run_all.sh DD 1" will run on the DD dataset with gpu 1 for 10 times of 10-fold cv experiments, using the pre-stored splitting indices in "data/shuffle/".

To run only one series of 10-fold cv, change the line 17 in run_all.sh to "for i in $(seq 1 1)".

You may also generate your own random indices and overwrite the pre-stored ones. The pre-stored indices were generated in the line 46 of "compare.m" when running graph kernel experiments.

By default, DGCNN will randomly split the dataset without using pre-stored indices. See line 33 of "main.lua".

alexandrazxf commented 6 years ago

There's no validation set in your experiment, how did you determine MaxEpoch so that the model achieve best result without overfitting?

muhanzhang commented 6 years ago

Validation set is not very useful here, since the datasets are too small for validation set to be representative of test set. The MaxEpoch is empirically tuned on a random 90% and 10% train validation split for each dataset which shows convergence of optimization and stopped increasing of accuracy. Then it is consistently used for all 100 runs. The MaxEpoch is not guaranteed to achieve best result or avoid overfitting every time, since it is only a rough empirical estimate. To enable early stopping, set "-valRatio" to >0, and append "-earlyStop".

alexandrazxf commented 6 years ago

That make sense. Thanks for your quick reply!