Open robertamig opened 2 years ago
here is my solution. It is bio-related. I start with a list of vectors. In my case a list of genes belonging to different signatures.
str(list_filter)
List of 9
$ CellAge_Induces : chr [1:153] "AAK1" "ABI3" "ADCK5" "AGT" ...
$ CellAge_Inhibits : chr [1:121] "ACLY" "AKR1B1" "ASPH" "ATF7IP" ...
$ CLASSICAL_SASP : chr [1:38] "BGN" "CCL2" "CCL20" "COL1A1" ...
$ FRIDMAN_SENESCENCE_UP : chr [1:77] "ALDH1A3" "CCND1" "CD44" "CDKN1A" ...
$ ISM_SCORE : chr [1:128] "HSH2D" "OTOF" "TRIM69" "PSME1" ...
$ MOSERLE_IFNA_RESPONSE : chr [1:31] "CD274" "CMPK2" "CXCL10" "DDX58" ...
$ REACTOME_SENESCENCE_SASP: chr [1:110] "ANAPC1" "ANAPC10" "ANAPC11" "ANAPC15" ...
$ SAEPHIA_CURATED_SASP : chr [1:38] "IL1A" "IL1B" "CXCL10" "CXCL1" ...
$ senmayo : chr [1:125] "ACVR1B" "ANG" "ANGPT1" "ANGPTL4" ...
From this list, I generate two tables: One with the unique gene names
df2 <- data.frame(gene=unique(unlist(list_filter)))
head(df2)
gene
1 AAK1
2 ABI3
3 ADCK5
4 AGT
5 AKT1
6 ALOX15B
dim(df2)
[1] 671 1
One is simply a "dataframe" version of the list. With every gene in the signature and the name of every signature.
df1 <- lapply(list_filter,function(x){
data.frame(gene = x)
}) %>%
bind_rows(.id = "path")
head(df1)
path gene
1 CellAge_Induces AAK1
2 CellAge_Induces ABI3
3 CellAge_Induces ADCK5
4 CellAge_Induces AGT
5 CellAge_Induces AKT1
6 CellAge_Induces ALOX15B
dim(df1)
[1] 821 2
now I iterate the search of each unique gene name and save the identity of the signatures in a column.
df_int <- lapply(df2$gene,function(x){
# pull the name of the intersections
intersection <- df1 %>%
dplyr::filter(gene==x) %>%
arrange(path) %>%
pull("path") %>%
paste0(collapse = "|")
# build the dataframe
data.frame(gene = x,int = intersection)
}) %>%
bind_rows()
head(df_int,n=20)
gene int
1 AAK1 CellAge_Induces
2 ABI3 CellAge_Induces
3 ADCK5 CellAge_Induces
4 AGT CellAge_Induces
5 AKT1 CellAge_Induces
6 ALOX15B CellAge_Induces
7 AR CellAge_Induces
8 ARPC1B CellAge_Induces
9 ASF1A CellAge_Induces
10 AXL CellAge_Induces|senmayo
11 BHLHE40 CellAge_Induces
12 BLK CellAge_Induces
13 BRAF CellAge_Induces
14 BRD7 CellAge_Induces
15 CAV1 CellAge_Induces
16 CCND1 CellAge_Induces|FRIDMAN_SENESCENCE_UP
17 CDK18 CellAge_Induces
18 CDKN1A CellAge_Induces|FRIDMAN_SENESCENCE_UP|REACTOME_SENESCENCE_SASP
19 CDKN1C CellAge_Induces|FRIDMAN_SENESCENCE_UP
20 CDKN1B CellAge_Induces|REACTOME_SENESCENCE_SASP
dim(df_int)
[1] 671 2
the dataframe can be summarised and compared to the output provided by calling
df_int %>%
group_by(int) %>%
summarise(n=n()) %>%
arrange(desc(n))
# A tibble: 47 × 2
int n
<chr> <int>
1 CellAge_Induces 126
2 CellAge_Inhibits 110
3 REACTOME_SENESCENCE_SASP 95
4 ISM_SCORE 93
5 senmayo 77
6 FRIDMAN_SENESCENCE_UP 44
7 ISM_SCORE|MOSERLE_IFNA_RESPONSE 27
8 CLASSICAL_SASP|senmayo 12
9 CLASSICAL_SASP 8
10 SAEPHIA_CURATED_SASP 8
# … with 37 more rows
# ℹ Use `print(n = ...)` to see more rows
upset(fromList(list_filter),nsets = 10)
ComplexHeatmap has a function for this: extract_comb()
https://jokergoo.github.io/ComplexHeatmap-reference/book/upset-plot.html#upset-utility-functions
Hello! Would it be possible to add a feature to extract the lists of elements making up the intersections, or also the elements unique to each group when plotting an UpsetR plot?