rnabioco / someta

Inclusion of processed cell metadata improves single cell sequencing analysis reproducibility and accessibility
https://rnabioco.github.io/someta/
MIT License
8 stars 3 forks source link

Compare studies on GEO to curated database of single cell data #3

Closed kriemo closed 4 years ago

kriemo commented 4 years ago

Use this preprint: https://www.biorxiv.org/content/10.1101/742304v2

and database: https://docs.google.com/spreadsheets/d/1En7-UV0k0laDiIfjFkdn7dggyR7jIk3WH8QgXaMOZF0/edit#gid=0

raysinensis commented 4 years ago

overlap for human datasets

length(intersect(gds_h$id, h_ids))/length(h_ids) [1] 0.8728814

overlap in mouse datasets

length(intersect(gds_m$id, m_ids))/length(m_ids) [1] 0.8948949

raysinensis commented 4 years ago

spot checking cases:

  1. superseries where we got correct subseries
  2. data not available <- this might be another hill to die on
  3. only used "scRNAseq"
  4. experiment type listed as other <- another point to make about standardization

"expression profiling by high throughput sequencing" AND ("single cell" OR "single-cell" OR "scRNAseq" OR "scRNA-seq") gives 3994, slightly more

  1. single nuclei, do we want those too?

"expression profiling by high throughput sequencing" AND ("single nuclei" OR "single cell" OR "single-cell" OR "scRNAseq" OR "scRNA-seq" OR "snRNAseq" OR "snRNA-seq") gets 4025