xiaoyeye / CNNC

covolutional neural network based coexpression analysis
MIT License
73 stars 23 forks source link

Expression data #4

Closed deytonmoy000 closed 4 years ago

deytonmoy000 commented 4 years ago

I could not find the expression data (h5 files) used for this experiment. What is the shape of the expression data before it is processed into NEPDF format? Is it N X 32 (N: number of genes, and 32 samples/conditions for each gene) or do you compress the information from much larger features set to just 32 for each gene?

xiaoyeye commented 4 years ago

Hi, I am sorry for the delay becuase I did not receive any reminder.... The expression data can be downloaded from the AWS links in the 'Data sources' part, like 'https://s3.amazonaws.com/mousescexpression/rank_total_gene_rpkm.h5'.

The expression data has the shape of N*M where N is cell number and M is gene number.

The shape of NEPDF data is X 32 32, where X depends on the length of gene pair list you would like to train and test. For example, the number of all possible gene pairs are M M, while you only focus on X pairs of them and you will generate one 32 32 histogram matrix for each gene pair of the whole X pairs, where 32 means that the expression range of the gene is uniformly divided into 32 bins. The whole idea for CNNC can be found in the paper 'Deep learning for inferring gene relationships from single-cell expression data'. The compression is through the 32 * 32 histogram generation for each gene pair. Best