kekegg / DLEPS

A Deep Learning based Efficacy Prediction System for drug discovery
61 stars 34 forks source link

Missing files in DLEPS/data #6

Open Nelhachem opened 3 years ago

Nelhachem commented 3 years ago

Hi, Thanks for providing this nice novel DL tool. However, there are some missing files and we appreciate if you could add those sometime soon.

Without the first 2 files, the DLEPS algorithm won't work in both Colab and Jupyter notebooks! Thanks

ssq1993 commented 3 years ago

We met the same problems.

Nelhachem commented 3 years ago

It is quite puzzling, none of the authors replied to my email! It is a nice nature biotech publication; however we have the right to understand why some files are missing. If they are found on public sources, we appreciate you provide a link or a way to download the h5 file.

GemaRG96 commented 3 years ago

Hi! Same issue here. The denseweight.h5 file corresponds to the weights for inferring the gene expression of the 12K genes from the 978 landmarks. I managed to find a file from the LINCS project that should correspond to these weights, but still there are the other 2 files missing and it would be safer if they provide also the denseweight.h5 or at least the link/source from where they took it.

Nelhachem commented 3 years ago

Hi GemaRG96 I might have found the vae hdf5 https://github.com/mkusner/grammarVAE/blob/master/pretrained/zinc_vae_grammar_L56_E100_val.hdf5 Would you plz share the link to the denseweight.h5 file. it should be somewhere on the LINCS website... but the authors of the Nat Biotech paper did not add this info on github

GemaRG96 commented 3 years ago

Hi! Here https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE92742 inside the 'GSE92742_Broad_LINCS_auxiliary_datasets.tar.gz' file you will find the file 'DS_GEO_OLS_WEIGHTS_n979x21290.gctx' that contains the weights for the 21K inferred genes. The thing is that it is not an .h5 and it has more than the 12K genes they use in DLEPS, so in order to use it we should create the .h5 file and make several assumptions regarding the order of the genes, etc. Therefore, even having the weights I don't think we can use it without some guidance.

Nelhachem commented 3 years ago

Hi! Yes, that make sense. An approach would be to test some ground truth signature and see if the h5 file from LINCS works as expected...until we have a reasonable answer from the authors

zqfang commented 3 years ago

@Nelhachem , I got no emal relies too. Don't know why they hide these files.

I think there's a updated weight file here: GSE92743_Broad_OLS_WEIGHTS_n979x11350.gctx.gz

Do you have any ideas what the benchmark.csv file is ? how to generate one?

CompBioT commented 3 years ago

We meet the same problem. It's grateful if authors could reply our issue. This model could not be run at all unless the necessary documents are provided, including related input files, weights, and so on.

wuys13 commented 2 years ago

We met the same problems.

jfckkiu commented 2 years ago

We also met the the same ploblems, can't believe this problem hasn't been solved

joey0214 commented 2 years ago

@Nelhachem , I got no emal relies too. Don't know why they hide these files.

I think there's a updated weight file here: GSE92743_Broad_OLS_WEIGHTS_n979x11350.gctx.gz

Do you have any ideas what the benchmark.csv file is ? how to generate one?

how did you process this file?

tqinger commented 1 year ago

Excuse me, can you use this model?

joey0214 commented 1 year ago

Excuse me, can you use this model?

nope, "denseweight.h5" file is missing.

dreamfly999 commented 1 year ago

The "denseweight.h5" link is https://kaggle.com/datasets/b0a096e3c550146f2a786f0ffd3c8bd37d68b04c7b09697efd282f91f8f6e36f,was it recently updated by the author ? But I also want to know where is the "benchmark.csv" file. I hope get authors' guidence. Any body try the script.

ACDBio commented 1 year ago

Cool paper, and benchmark.csv (the average expression levels for the 978 genes) is utterly needed to calculate enrichment.

muralikrishnasn commented 1 year ago

Hi, from where can we get the benchmark.csv file