jocpae / VesselGraph

MIT License
110 stars 9 forks source link

Question regarding atlas processed #22

Closed gsabarinath02 closed 1 year ago

gsabarinath02 commented 1 year ago

Respected Authors,

We are currently doing our research into finding a better link prediction than the current benchmark using the OGB Vessel Graph dataset. We have spent quite an amount of time trying to understand the dataset and the problem definition at hand and work accordingly. Right now, we have a doubt and would like to receive futher clarification.

We have downloaded the csv files from the open source freely available public github repository https://github.com/jocpae/VesselGraph and have been building our models using the information avaialable there. While working with both synthetic and non-synthetic datasets, we noticed an additional csv file which existed as a part of all the non synthetic datasets. The synthetic ones had a "nodes processed " & " edges processed file", in addition to these two there seems to be a csv file named "atlas processed" which we do not know the purpose of. Even after reading further documentation, we have not understood it's purpose. We humbly request your help in understanding what these are so that we may conduct our research in the right direction. We hope you would answer the following questions and clear our doubts.

  1. What is the "atlas processed" csv file ?
  2. What is it's purpose in the larger sceheme of things, in relation to the other csv files?
  3. Do we require to integrate that into our model like we did with nodes processed and edges processed csv file? If so, how do we properly utilize it?

We hope we have formulated our query properly and hope to hear from you as soon as possible.

jqmcginnis commented 1 year ago

Hello @gsabarinath02,

thank you very much for your interest in the dataset.

As you already mentioned, the OGB-vessel dataset is the most accessible and easiest to prototype your new link prediction algorithm. So, if you are interested in testing it on a very curated dataset, we recommend working with the OGB-vessel graph (and its OGB implementation instead). Also, if you are working on a paper and want to make it easily reproducible, OGB is probably the way to go.

That being said, if you feel that you want to go into the details, skip all of the current processing and do it by yourself (if you want to learn / understand the process or do not agree with the current implementation), it's definitely possible to even look at csv-file level. So let's have a look:

  1. What is the "atlas processed" csv file ?

The Atlas file provides the possibility of adding a new node feature, namely the Alan Brain Atlas region (of the mouse), to the existing graphs. The Vessap paper provides registered brain region files, and we one-hot-encoded these into feature depending on the (x,y,z) position of the node. You can find a description of this in our paper, section 3.1 - Table 2.

  1. What is it's purpose in the larger sceheme of things, in relation to the other csv files?

Our workflow is the following:

  1. voreen graph generation (generates csvs)
  2. post-process graph (that is, merge edges, etc)
  3. generate pyg graph
  4. generate OGB graph

In Step 3., we provide the possibility of using more nodes features than those currently used in our link prediction task. If you want to use these (i.e. use the Atlas csv file), you would need to set this flag to True, if you want to use the Atlas features in the OGB graph as well. https://github.com/jocpae/VesselGraph/blob/d9b7e5052103c356240c2780785075891fe9014d/source/pytorch_dataset/link_dataset.py#L131

  1. Do we require to integrate that into our model like we did with nodes processed and edges processed csv file? If so, how do we properly utilize it?

Normally, you don't even need to manually integrate the csvs at all into your model, except if you refrain from using pyg and ogb at all. However, I would strongly recommend using these libraries - they make your life a lot easier and offer you a superior platform for your research. Is there a specific reason you do not want to use these?

That being said, you do not need the Atlas csv files, in fact you would be the first to actually use them! If you want to experiment, I would be curious if they boost the performance!

If you have any further questions, I am happy to help! :slightly_smiling_face: