Closed l4b4r4b4b4 closed 2 weeks ago
Hey which disease id did you use?
Hey which disease id did you use?
5819.0.
Im currently trying to understand what dataset split I need to load in order to be able to make pedictions on a given disease id or a list of given ids.
I dont really understand the section in the readme on loading a disease_eval
split and defining an id, before passing a list of disease ids to eval_disease_centric
.
have you looked at this demo notebook? https://github.com/mims-harvard/TxGNN/blob/main/TxGNN_Demo.ipynb this runs
So the idea is that during training, the model will only fine-tune on a small set of known indications drug-disease pairs. During inference, we want to infer on all the drugs given a disease, like a small virtual screening. that is why no matter what is the data split, it is always useful to get the evaluation output.
have you looked at this demo notebook? https://github.com/mims-harvard/TxGNN/blob/main/TxGNN_Demo.ipynb this runs
So the idea is that during training, the model will only fine-tune on a small set of known indications drug-disease pairs. During inference, we want to infer on all the drugs given a disease, like a small virtual screening. that is why no matter what is the data split, it is always useful to get the evaluation output.
yes I have. I have a feeling this might be connected to the underlaying csv files not being the right ones.
The download links in TxData
are not up to date anymore. I updated it to the following:
data_download_wrapper(
"https://dvn-cloud.s3.amazonaws.com/10.7910/DVN/IXA7BM/1805e679c4c-72137dbedbf1?response-content-disposition=attachment%3B%20filename%2A%3DUTF-8%27%27kg.csv&response-content-type=text%2Fcsv&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20241007T075549Z&X-Amz-SignedHeaders=host&X-Amz-Expires=3600&X-Amz-Credential=AKIAIEJ3NV7UYCSRJC7A%2F20241007%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=0e04af054b75fd6928054d2209e3c1826d47fbf68f5a4898e783166684582cd8",
os.path.join(self.data_folder, "kg.csv"),
)
data_download_wrapper(
"https://dvn-cloud.s3.amazonaws.com/10.7910/DVN/IXA7BM/1805e69f00e-fcf0acc588bb.orig?response-content-disposition=attachment%3B%20filename%2A%3DUTF-8%27%27nodes.csv&response-content-type=text%2Fcsv&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20241007T081502Z&X-Amz-SignedHeaders=host&X-Amz-Expires=3600&X-Amz-Credential=AKIAIEJ3NV7UYCSRJC7A%2F20241007%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=fb42acf98df1950a329a3555a9f13a10157ea2746c6e29ee87ff538e4ca2a1c5",
os.path.join(self.data_folder, "nodes.csv"),
)
data_download_wrapper(
"https://dvn-cloud.s3.amazonaws.com/10.7910/DVN/IXA7BM/1805e69de19-31377b621f41?response-content-disposition=attachment%3B%20filename%2A%3DUTF-8%27%27edges.csv&response-content-type=text%2Fcsv&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20241007T081358Z&X-Amz-SignedHeaders=host&X-Amz-Expires=3600&X-Amz-Credential=AKIAIEJ3NV7UYCSRJC7A%2F20241007%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=36e39ecbb4e885d09a9b2a6945f8311af55cd72972b09fb7458ae3de95d35488",
os.path.join(self.data_folder, "edges.csv"),
)
Also why does it load and read in node.csv
as tab delimited and not as commma separated?
ok, after some debuggin, here is the likely solution:
.append
on df from version 2.xrandom_fold
and complex_disease_fold
to use .concat
Now inference is successfull.
Will refactor the other .append
instances on df in the codebase and check out what effect torch.compile
has on inference time over the test set.
I was able to successfully load the model to the GPU and kick of inference.
However when wanting to get back the scores for the prediction I get an error, not matter which disease_id I use...