nanxstats / blog-comments

utterances comments
0 stars 1 forks source link

blog/post/exp2vec/ #18

Open utterances-bot opened 1 year ago

utterances-bot commented 1 year ago

Training Tissue-Specific Gene Embeddings on GTEx Data - Nan Xiao | 肖楠

In this post, I showed how to train tissue-specific gene embeddings using GTEx data and text2vec. Two applications are presented to measure gene similarities and discover linear algebraic structure of genes.

https://nanx.me/blog/post/exp2vec/

BoxiLin commented 1 year ago

very interesting work! i have a question/comment: each gene has its location in the genome (e.g. BRCA1 is on Chromosome 17: 43.04 – 43.17 Mb). It seems that the location of gene was not considered here. However, it is not surprising that the locationally "close" genes tend to have "similar" expression (possibly due to linkage disequilibrium?)

I think it might be interesting to look at those pairs of gene with small locational distance but with large expressional distance based on your output gene embedding, or vice verse

nanxstats commented 1 year ago

very interesting work! i have a question/comment: each gene has its location in the genome...

@BoxiLin Thanks. My take is that the gene location is (an important) part of the semantic knowledge about genes, which is at least partially reflected by each gene's expression context (how much the other genes were expressed with one gene). The temporal order of expression would be another useful context to have if technically possible.

That being said, if necessary, such knowledge about genes can be explicitly incorporated into the trained embeddings by simple post-processing algorithms like "retrofitting": https://arxiv.org/abs/1411.4166 The idea is similar to those developed in (supervised) distance metric learning.