working-yuhao / DEAL

IJCAI2020
MIT License
19 stars 7 forks source link

Loading custom datasets #6

Open zoemeini opened 3 years ago

zoemeini commented 3 years ago

Hello everybody :)

I am working with a custom pipeline for performing link prediction in a graph. I construct this graph through processing of csv data but in the end I obtain an object of class pytorch_geometric.Dataset (the same class as the default ones used in this repo like cora, protein, email...).

I would like to know what part of the code of this repo should I modify to load my custom dataset object for performing link prediction.

Thank you very much!

Lemour-sudo commented 3 years ago

You may need to create a folder of your custom dataset in the data/ folder similar to the CiteSeer folder under data/. Then pass the name of your custom data folder as an argument in the cmd line when calling the train.py script. For example: if I save my custom dataset files under data/ as: data/Custom-Data, then I can point to this new dataset folder by calling: python train.py --dataset Custom-Data

lucky6qi commented 2 years ago

I also started working on loading custom datasets, however I don't know how I could prepare my data into the files in the format acoording to data/CiteSeer folder. Moreover if I use the python train.py --dataset CiteSeer, I ended up with the error


  File "train.py", line 117, in <module>
    loss = deal.default_loss(inputs, labels, data, thetas=theta_list, train_num=int(X_train.shape[0] *args.train_ratio)*2)
  File "/Users/liuqi7/deal/model.py", line 466, in default_loss
    dists = data.dists[nodes[:,0],nodes[:,1]] 
TypeError: 'NoneType' object is not subscriptable```
lajd commented 2 years ago

Hi @lucky6qi, I had the same issue and the solution is to download the dists-1.dat file, as in data/CiteSeer/About dist data. I created a fork to assist with the setup instructions (see Installation here https://github.com/lajd/DEAL/blob/master/README.md)

basudev-yadav commented 2 years ago

@lajd I looked at your code. Looks like you have used datasets available in the pytorch geometric datasets. I want to run it on my data but I don't know how to prepare the data into the format that is used in the code for example those numpy zip files and sparse matrix. I am facing problems in understanding what those files represent and on what basis they are made.

basudev-yadav commented 2 years ago

@lajd Is it possible that the dists file contains the normalized shortest path distances between each pair of nodes?

FatemeMirzaeii commented 1 year ago

@lajd I looked at your code. Looks like you have used datasets available in the pytorch geometric datasets. I want to run it on my data but I don't know how to prepare the data into the format that is used in the code for example those numpy zip files and sparse matrix. I am facing problems in understanding what those files represent and on what basis they are made.

same question. have you found the answer?

fatemehkarimi commented 5 days ago

@lajd Is it possible that the dists file contains the normalized shortest path distances between each pair of nodes?

Yes, it is the shortest path between each pair of nodes.