Open miltondp opened 3 years ago
Hi Milton, thanks for the feedback. I've just uploaded a checkpoint of the model here. I somehow lost the weights of the model that we used in the paper - I had to retrain it from scratch. Unfortunately, I do not have permission to upload the data here, but it is publicly available at this repo. I hope this helps, best wishes!
Hi @rvinas, thank you for your help, and sorry for the delay.
I want to use your model exactly as you trained it to simulate the same data you generated for the paper. However, I didn't find the instructions to do that (I'm not an expert in the models you used, but I'm interested in the data you can simulate). Would it be possible to provide that?
Something I want to do is to recreate your Table 1 in the paper but using a different correlation coefficient (not Pearson). So if I can get the real data (you already told me how to get that) and the simulated data with all three methods (or at least your method) I would be able to do it.
Hi @rvinas, just touching base to see if you had a chance to look at this. I would love to use the models you published in Bioinformatics, but it gets very hard with the actual code and documentation. Would it be possible to have some guidelines to use your models and simulate the data, or even simpler, download the data that you simulated? Thanks
Hi @miltondp, apologies for the delay - I am swamped with work, but I'll look into this as soon as possible. Is it the synthetic human gene expression that you would like to have? I could generate some data and upload a copy. Some of the metrics of Table 1 require the underlying gene regulatory network to be computed (which I don't have for human data)
Hi @rvinas, thank you again for your prompt response. I'm also busy and might respond with a delay as well.
Yes, I'm mainly interested in the synthetic human gene expression generated with your simulator, although it would be great to have it for E.coli if you can provide that. Actually, I would like to recompute one of the general matrics (Section 4.3.1.1 of your paper, $S_{dist}$) only for your GAN method (and the random and real rows in Table 1) but with a different correlation coefficient (not Pearson). So I won't need any GRN for SynTReN or GNW. I understand that you get the pairwise correlation between the real (test set) and the artificial datasets (generated from the training set) and then compute the correlation again on these distances to assess simulation performance.
Ideally, it would be awesome to have both the real and the synthetic data, both training and test sets. This would be the synthetic data with 680 samples for E.coli and 2287 samples for human/GTEx + TCGA, including the training and test sets. I know you told me that you cannot upload the real data, but maybe if you provide an URL to download it, or the script to download/preprocess, just to make sure I'm using the right genes and samples, that would be great.
Hi @miltondp, two updates:
That's all I have for now, I hope this is helpful!
Awesome, thank you so much, @rvinas! This is indeed super helpful. I already requested access.
- Here is a notebook showing how to use the trained model to synthesize human data. The synthetic data in the folder above was generated with this notebook.
Hi @rvinas I am unable to access this code. I am in need of synthesising human microarray and rnaseq data with same charachteristics which is how i landed on this project. Pls help with how to go about generating synthetic data using this.
Thank you for this code and the publication, which I found very interesting. I would like to use the trained GAN models, but I cannot find them in this repo. Would you please indicate where the files are or push them here? Seems like they should be in
checkpoints/models/
, but the folder does not exist. The input data is also missing. Thanks again!