wikipathways / pathway-figure-ocr

Extracting gene sets from published pathway figures
Apache License 2.0
15 stars 2 forks source link

MSigDb unique gene count #27

Closed AlexanderPico closed 3 years ago

AlexanderPico commented 3 years ago

This page (https://www.gsea-msigdb.org/gsea/msigdb/genesets.jsp?collection=C2) has 11 signatures with "Alzheimer" in their name.

Click on each to find a "download gene set" option, e.g., GMT and TXT formats.

Can you help get a count for unique genes across these 11 sets?

AlexanderPico commented 3 years ago

these signatures contain only 3,077 unique genes

AlexanderPico commented 3 years ago

Let's just count the C2/CP collection. The other "CGP" sets are just DE gene lists from perturbation experiments and are thus not comparable to curated pathways.

https://www.gsea-msigdb.org/gsea/msigdb/genesets.jsp?collection=CP

khanspers commented 3 years ago

There are 4 sets in C2_CP for Alzheimers, from 4 sources (WP, KEGG, BIOCARTA and REACTOME). Total unique genes 89. Spreadsheet here: https://www.dropbox.com/s/2hel6rllbsj0juf/MSigDB_Counts_C2-CP.xlsx?dl=0 And screenshot of the summary here:

Screen Shot 2021-05-27 at 4 44 12 PM
AlexanderPico commented 3 years ago

Wait, how are the total unique only 89 when KEGG and WP each have 166 and 150?

khanspers commented 3 years ago

Updated: Combined unique is 257. Updated spreadsheet.

(The 89 count were the dups, sorry)

khanspers commented 3 years ago
Screen Shot 2021-05-28 at 10 01 17 AM