openproblems-bio / openproblems

Formalizing and benchmarking open problems in single-cell genomics
MIT License
289 stars 77 forks source link

[Task Proposal] - Marker genes identification #110

Open aopisco opened 3 years ago

aopisco commented 3 years ago

Describe the problem concisely. Cell types are historically characterized by morphology and functional assays, but one of the main advantages of single-cell transcriptomics is their ability to define cell identities and their underlying regulatory networks. We have developed scRFE to produce interpretable gene lists for all cell types in a given dataset that can be used to design experiments or to further aid with biological data mining, but it would be great to benchmark scRFE and other methods in the community

Propose datasets So far we used Tabula Muris Senis

Propose methods NEED TO DO

Propose metrics Compare with ground truth marker genes (https://docs.google.com/spreadsheets/d/1SsKS2vMvZqLdJ8hOv_dBeXo3k4Gcwcy-oPBhFTCo0tY/edit?usp=sharing)

LuckyMD commented 3 years ago

I like the idea of this and think benchmarking marker gene identification would be very impactful... but how do you define ground truth marker genes? Are these genes derived from the literature (usually surface markers)? Or are these markers that are validated across batches and datasets (sounds like a very tough task)?

aopisco commented 3 years ago

We have a compiled list of marker genes from the literature and expert groups that are mostly but not exclusively surface markers. We could start there?

LuckyMD commented 3 years ago

I think that sounds fair. Ideally this task would have a second dataset with marker genes that are not mostly surface markers to show some diversity of test cases. Could we come up with a second dataset as well?