rvinas / adversarial-gene-expression

Adversarial generation of gene expression data using Generative Adversarial Networks
MIT License
24 stars 5 forks source link

Trained models are not accessible #2

Open miltondp opened 3 years ago

miltondp commented 3 years ago

Thank you for this code and the publication, which I found very interesting. I would like to use the trained GAN models, but I cannot find them in this repo. Would you please indicate where the files are or push them here? Seems like they should be in checkpoints/models/, but the folder does not exist. The input data is also missing. Thanks again!

rvinas commented 3 years ago

Hi Milton, thanks for the feedback. I've just uploaded a checkpoint of the model here. I somehow lost the weights of the model that we used in the paper - I had to retrain it from scratch. Unfortunately, I do not have permission to upload the data here, but it is publicly available at this repo. I hope this helps, best wishes!

miltondp commented 3 years ago

Hi @rvinas, thank you for your help, and sorry for the delay.

I want to use your model exactly as you trained it to simulate the same data you generated for the paper. However, I didn't find the instructions to do that (I'm not an expert in the models you used, but I'm interested in the data you can simulate). Would it be possible to provide that?

miltondp commented 3 years ago

Something I want to do is to recreate your Table 1 in the paper but using a different correlation coefficient (not Pearson). So if I can get the real data (you already told me how to get that) and the simulated data with all three methods (or at least your method) I would be able to do it.

miltondp commented 3 years ago

Hi @rvinas, just touching base to see if you had a chance to look at this. I would love to use the models you published in Bioinformatics, but it gets very hard with the actual code and documentation. Would it be possible to have some guidelines to use your models and simulate the data, or even simpler, download the data that you simulated? Thanks

rvinas commented 3 years ago

Hi @miltondp, apologies for the delay - I am swamped with work, but I'll look into this as soon as possible. Is it the synthetic human gene expression that you would like to have? I could generate some data and upload a copy. Some of the metrics of Table 1 require the underlying gene regulatory network to be computed (which I don't have for human data)

miltondp commented 3 years ago

Hi @rvinas, thank you again for your prompt response. I'm also busy and might respond with a delay as well.

Yes, I'm mainly interested in the synthetic human gene expression generated with your simulator, although it would be great to have it for E.coli if you can provide that. Actually, I would like to recompute one of the general matrics (Section 4.3.1.1 of your paper, $S_{dist}$) only for your GAN method (and the random and real rows in Table 1) but with a different correlation coefficient (not Pearson). So I won't need any GRN for SynTReN or GNW. I understand that you get the pairwise correlation between the real (test set) and the artificial datasets (generated from the training set) and then compute the correlation again on these distances to assess simulation performance.

Ideally, it would be awesome to have both the real and the synthetic data, both training and test sets. This would be the synthetic data with 680 samples for E.coli and 2287 samples for human/GTEx + TCGA, including the training and test sets. I know you told me that you cannot upload the real data, but maybe if you provide an URL to download it, or the script to download/preprocess, just to make sure I'm using the right genes and samples, that would be great.

rvinas commented 3 years ago

Hi @miltondp, two updates:

  1. I have uploaded the real (train and test) and generated human transcriptomics data here. I will grant you access as soon as you request it. The real data was downloaded from this repo (see ‘data' folder). We then used this code to further merge the tissue-specific data into a single dataset.
  2. Here is a notebook showing how to use the trained model to synthesize human data. The synthetic data in the folder above was generated with this notebook.

That's all I have for now, I hope this is helpful!

miltondp commented 3 years ago

Awesome, thank you so much, @rvinas! This is indeed super helpful. I already requested access.

noob9721 commented 2 years ago
  1. Here is a notebook showing how to use the trained model to synthesize human data. The synthetic data in the folder above was generated with this notebook.

Hi @rvinas I am unable to access this code. I am in need of synthesising human microarray and rnaseq data with same charachteristics which is how i landed on this project. Pls help with how to go about generating synthetic data using this.

rvinas commented 2 years ago

Hi @noob9721, my fault, that link pointed to the wrong URL (updated now). This is the notebook.