Closed Sithara85 closed 2 years ago
I just tries a trick on the data. As I have seen Gregory's dataset had all gene expression distribution ranges from 0-1 so I have multiplied my log2cpm data with 0.01 so all values ranges from 0-1. Now I have better results to visualize, So I am wondering where we are setting the input tensors to range from 0-1.
New input_rnaseq_reconstruct:
XXbac-BPG248L24.12 | TTN | RP11-290D2.6 | JSRP1 | RP11-115D19.1 | HCG4P5 | AC114271.2 | RP3-394A18.1 | ABALON | KB-1208A12.3 | ... | LNPK | NBPF15 | ATP8B4 | AC005522.7 | CHID1 | ARFRP1 | NAPB | CTB-133G6.2 | SPATA24 | POU2F2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.114204 | 0.115338 | 0.114985 | 0.106858 | 0.102436 | 0.109176 | 0.109893 | 0.109055 | 0.09779 | 0.107059 | ... | 0.022572 | 0.020735 | 0.022362 | 0.020162 | 0.021934 | 0.019642 | 0.021585 | 0.020372 | 0.019506 | 0.020419 |
0.106944 | 0.105709 | 0.105965 | 0.104501 | 0.104928 | 0.102605 | 0.097524 | 0.097808 | 0.10184 | 0.098146 | ... | 0.021987 | 0.020497 | 0.019832 | 0.018447 | 0.018818 | 0.019570 | 0.018344 | 0.020552 | 0.018087 | 0.020798 |
and gene_summary: gene mean | gene abs(sum) |
---|---|
0.004425 | 0.027955 |
0.004736 | 0.023202 |
0.004456 | 0.022163 |
0.004741 | 0.020509 |
0.004347 | 0.017307 |
Thanks @Sithara85 - a couple things:
after disabling the eager execution to make your program work in Tensorflow 2.
Can you elaborate what your solution was? Perhaps others will see this and will be interested in knowing exactly what you had to change.
But when we use our gene expression data (which is log 2 cpm normalized data), I am getting all the reconstructed values as 1.0
The input data need to be normalized further to be in the range of 0-1. See process-data.ipynb
for specific details.
Could you let me know if you can think of any issue with my input data shape. Dimension: (3045, 10956)
I recommend reducing the number of gene features you're using. In process-data.ipynb
, you will also see that we reduced gene dimensions by selecting the top 5,000 most variably expressed genes.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Hi Gregory,
I am very happy to see how you have detailed the steps for gene expression VAE based model. I am doing some analysis on gene expression prediction model to classify dementia. I started learning teh applications of VAE model/ machine learning models in omic prediction models. Also I am new to Tensorflow/Keras.
I successfully implemented your code using your gene expression data after disabling the eager execution to make your program work in Tensorflow 2. But when we use our gene expression data (which is log 2 cpm normalized data), I am getting all the reconstructed values as 1.0 so my gene_mean and gene_summary remains same. I evaluated your data distribution ( it looks gene expression data in the range of 0-1).
Could you let me know if you can think of any issue with my input data shape.
Input:
Dimension: (3045, 10956) data:
input_rnaseq_reconstruct.head(2):
gene_summary: