Open zoemeini opened 3 years ago
You may need to create a folder of your custom dataset in the data/ folder similar to the CiteSeer folder under data/. Then pass the name of your custom data folder as an argument in the cmd line when calling the train.py script. For example: if I save my custom dataset files under data/ as: data/Custom-Data, then I can point to this new dataset folder by calling: python train.py --dataset Custom-Data
I also started working on loading custom datasets, however I don't know how I could prepare my data into the files in the format acoording to data/CiteSeer folder. Moreover if I use the python train.py --dataset CiteSeer, I ended up with the error
File "train.py", line 117, in <module>
loss = deal.default_loss(inputs, labels, data, thetas=theta_list, train_num=int(X_train.shape[0] *args.train_ratio)*2)
File "/Users/liuqi7/deal/model.py", line 466, in default_loss
dists = data.dists[nodes[:,0],nodes[:,1]]
TypeError: 'NoneType' object is not subscriptable```
Hi @lucky6qi, I had the same issue and the solution is to download the dists-1.dat
file, as in data/CiteSeer/About dist data.
I created a fork to assist with the setup instructions (see Installation here https://github.com/lajd/DEAL/blob/master/README.md)
@lajd I looked at your code. Looks like you have used datasets available in the pytorch geometric datasets. I want to run it on my data but I don't know how to prepare the data into the format that is used in the code for example those numpy zip files and sparse matrix. I am facing problems in understanding what those files represent and on what basis they are made.
@lajd Is it possible that the dists file contains the normalized shortest path distances between each pair of nodes?
@lajd I looked at your code. Looks like you have used datasets available in the pytorch geometric datasets. I want to run it on my data but I don't know how to prepare the data into the format that is used in the code for example those numpy zip files and sparse matrix. I am facing problems in understanding what those files represent and on what basis they are made.
same question. have you found the answer?
@lajd Is it possible that the dists file contains the normalized shortest path distances between each pair of nodes?
Yes, it is the shortest path between each pair of nodes.
Hello everybody :)
I am working with a custom pipeline for performing link prediction in a graph. I construct this graph through processing of csv data but in the end I obtain an object of class pytorch_geometric.Dataset (the same class as the default ones used in this repo like cora, protein, email...).
I would like to know what part of the code of this repo should I modify to load my custom dataset object for performing link prediction.
Thank you very much!