greenelab / pancancer

Building classifiers using cancer transcriptomes across 33 different cancer-types
BSD 3-Clause "New" or "Revised" License
119 stars 58 forks source link

Cancer type prediction #105

Closed rezacsedu closed 4 years ago

rezacsedu commented 4 years ago

Can I combine gene expression, mutation, and copy number data for cancer types prediction?

gwaybio commented 4 years ago

nothing stopping you! Although cancer-type (i.e. tissue type) is an easy signal to find

rezacsedu commented 4 years ago

Thanks a million for the quick reply. Sorry, but I've one more stupid question, RE: issue#102 (https://github.com/greenelab/pancancer/issues/102):

Can the same labels (i.e., ttps://github.com/greenelab/pancancer/blob/master/data/sample_freeze.tsv ) be used (let's say) to create a shared representation of the features (out of 3 datasets) to train a multimodal network?

gwaybio commented 4 years ago

Yep! So the idea is that you'd train a model to detect something (say cancer-type) but the real endpoint you care about is how the multimodal network combines features together? (and then interpret the combinations?)

rezacsedu commented 4 years ago

you care about is how the multimodal network combines features together? Yes, exactly.

gwaybio commented 4 years ago

Although with such a strong signal like cancer type, I worry that the models will be lazy and won't try to integrate too much