uci-cbcl / D-GEX

Deep learning for gene expression inference
GNU General Public License v2.0
147 stars 57 forks source link

Same genes in landmark set and target set #7

Closed lucananni93 closed 5 years ago

lucananni93 commented 5 years ago

Hi,

I am a PhD student at Polytechnic University of Milan and I am interested in the D-GEX work, of which you are one of the authors. I would like to ask how did you extract the two lists of landmark and target genes map_lm.txt and map_tg.txt in the Github repository.

I am noticing that a subset of the landmark genes (more or less 400 genes) is also part of the set of target genes (but considering different probes). What is the reason behind this?

Thank you in advance.

yil8 commented 5 years ago

@lucananni93 Hi, map_lm.txt and map_tg.txt were selected and provided by the original authors of the LINCS program. They used to have a website that you can checkout, unfortunately, the website is not accessible anymore.

lucananni93 commented 5 years ago

Thank you. What about the redundancy between the set of landmark and target genes?

yil8 commented 5 years ago

@lucananni93 Honestly, I'm not sure about that, but as long as the prob id is different, it should be reasonable. I do remember they picked the landmark genes out of target genes through PCA, rather than some biological reasons. That might partially explain why there are overlaps in gene symbol.