Loading custom datasets

zoemeini commented 3 years ago

Hello everybody :)

I am working with a custom pipeline for performing link prediction in a graph. I construct this graph through processing of csv data but in the end I obtain an object of class pytorch_geometric.Dataset (the same class as the default ones used in this repo like cora, protein, email...).

I would like to know what part of the code of this repo should I modify to load my custom dataset object for performing link prediction.

Thank you very much!

Lemour-sudo commented 3 years ago

You may need to create a folder of your custom dataset in the data/ folder similar to the CiteSeer folder under data/. Then pass the name of your custom data folder as an argument in the cmd line when calling the train.py script. For example: if I save my custom dataset files under data/ as: data/Custom-Data, then I can point to this new dataset folder by calling: python train.py --dataset Custom-Data

lucky6qi commented 2 years ago

I also started working on loading custom datasets, however I don't know how I could prepare my data into the files in the format acoording to data/CiteSeer folder. Moreover if I use the python train.py --dataset CiteSeer, I ended up with the error


  File "train.py", line 117, in <module>
    loss = deal.default_loss(inputs, labels, data, thetas=theta_list, train_num=int(X_train.shape[0] *args.train_ratio)*2)
  File "/Users/liuqi7/deal/model.py", line 466, in default_loss
    dists = data.dists[nodes[:,0],nodes[:,1]] 
TypeError: 'NoneType' object is not subscriptable```

lajd commented 2 years ago

Hi @lucky6qi, I had the same issue and the solution is to download the dists-1.dat file, as in data/CiteSeer/About dist data. I created a fork to assist with the setup instructions (see Installation here https://github.com/lajd/DEAL/blob/master/README.md)

basudev-yadav commented 2 years ago

@lajd I looked at your code. Looks like you have used datasets available in the pytorch geometric datasets. I want to run it on my data but I don't know how to prepare the data into the format that is used in the code for example those numpy zip files and sparse matrix. I am facing problems in understanding what those files represent and on what basis they are made.

basudev-yadav commented 2 years ago

@lajd Is it possible that the dists file contains the normalized shortest path distances between each pair of nodes?

FatemeMirzaeii commented 1 year ago

@lajd I looked at your code. Looks like you have used datasets available in the pytorch geometric datasets. I want to run it on my data but I don't know how to prepare the data into the format that is used in the code for example those numpy zip files and sparse matrix. I am facing problems in understanding what those files represent and on what basis they are made.

same question. have you found the answer?

fatemehkarimi commented 5 days ago

@lajd Is it possible that the dists file contains the normalized shortest path distances between each pair of nodes?

Yes, it is the shortest path between each pair of nodes.

working-yuhao / DEAL

Loading custom datasets #6