jolars / eulerr

Area-Proportional Euler and Venn Diagrams with Ellipses
https://jolars.github.io/eulerr/
GNU General Public License v3.0
129 stars 18 forks source link

Missing overlaps and areas in euler vs venn #102

Closed hermidalc closed 1 year ago

hermidalc commented 1 year ago

I thought with Euler diagrams up to 5-way it can represent all relationships using ellipses. I have a 4-way where eulerr is missing an overlap compared to the Venn (and sometimes area, see further below)

Here's my case, see attached files to reproduce

All-genes-H-bilis-ATCC-51630.csv All-genes-H-hepaticus-ATCC-51449.csv All-genes-A-muciniphila-ATCC-BAA-835.csv All-genes-H-pullorum-NCTC13154.csv

library(eulerr)

eulerr_options(
    main = list(fontsize = 8, lineheight = 2.0),
    labels = list(fontsize = 6, lineheight = 1.5),
    quantities = list(fontsize = 6)
)

# genes
hbil_gene_df <- read.delim("All-genes-H-bilis-ATCC-51630.csv")
hhep_gene_df <- read.delim("All-genes-H-hepaticus-ATCC-51449.csv")
amuc_gene_df <- read.delim("All-genes-A-muciniphila-ATCC-BAA-835.csv")
hpul_gene_df <- read.delim("All-genes-H-pullorum-NCTC13154.csv")

hbil_gene_df <- hbil_gene_df[!grepl("^T368_", hbil_gene_df$Gene.Name), , drop = FALSE]
hhep_gene_df <- hhep_gene_df[!grepl("^HH_", hhep_gene_df$Gene.Name), , drop = FALSE]
amuc_gene_df <- amuc_gene_df[!grepl("^AMUC_", amuc_gene_df$Gene.Name), , drop = FALSE]
hpul_gene_df <- hpul_gene_df[!grepl("^EL247_", hpul_gene_df$Gene.Name), , drop = FALSE]

gene_combi <- data.frame(
    row.names = union(
        hbil_gene_df$Gene.Name,
        union(
            hhep_gene_df$Gene.Name,
            union(
                amuc_gene_df$Gene.Name,
                hpul_gene_df$Gene.Name
            )
        )
    )
)
gene_combi$A <- row.names(gene_combi) %in% hbil_gene_df$Gene.Name
gene_combi$B <- row.names(gene_combi) %in% hhep_gene_df$Gene.Name
gene_combi$C <- row.names(gene_combi) %in% amuc_gene_df$Gene.Name
gene_combi$D <- row.names(gene_combi) %in% hpul_gene_df$Gene.Name

plot(
    euler(gene_combi, shape = "ellipse"),
    main = "Genes",
    quantities = list(type = "counts", cex = 1.5),
    labels = list(
        labels = c("H. bilis", "H. hepaticus", "A. muciniphila", "H. pullorum"),
        cex = 1.6
    ),
    fills = c("plum3", "indianred1", "gold", "lightsalmon"),
    adjust_labels = TRUE
)

plot(
    venn(gene_combi),
    main = "Genes",
    quantities = list(type = "counts", cex = 1.5),
    labels = list(
        labels = c("H. bilis", "H. hepaticus", "A. muciniphila", "H. pullorum"),
        cex = 1.6
    ),
    fills = c("plum3", "indianred1", "gold", "lightsalmon"),
    adjust_labels = TRUE
)

The Euler plot is missing the overlap area where 3 genes are in H. bil + H. hep + A. muc but not in H. pul? See in Venn.

4w_euler 4w_venn

Also a second kind of worrying issue, is when I draw the Euler again between different sessions or the same session, it's not deterministic, it seems to miss different A. muc areas/overlaps. Here's when I ran the same code again look now it's missing two areas, the same 3 gene one and the 2 gene only A. muc area

4w_euler

jolars commented 1 year ago

I thought with Euler diagrams up to 5-way it can represent all relationships using ellipses. I have a 4-way where eulerr is missing an overlap?

No, that's not the case. There's no guarantee that a 4-way relationship can be represented by ellipses. Maybe you're thinking of Venn diagrams?

On Tue Oct 4, 2022 at 6:17 PM CEST, Leandro Hermida wrote:

I thought with Euler diagrams up to 5-way it can represent all relationships using ellipses. I have a 4-way where eulerr is missing an overlap?

Here's my case, see attached files to reproduce

All-genes-A-muciniphila-ATCC-BAA-835.csv All-genes-H-bilis-ATCC-51630.csv All-genes-H-hepaticus-ATCC-51449.csv All-genes-H-pullorum-NCTC13154.csv

library(eulerr)

eulerr_options(
    main = list(fontsize = 8, lineheight = 2.0),
    labels = list(fontsize = 6, lineheight = 1.5),
    quantities = list(fontsize = 6)
)

# genes
hbil_gene_df <- read.delim("All-genes-H-bilis-ATCC-51630.tsv")
hhep_gene_df <- read.delim("All-genes-H-hepaticus-ATCC-51449.tsv")
amuc_gene_df <- read.delim("All-genes-A-muciniphila-ATCC-BAA-835.tsv")
hpul_gene_df <- read.delim("All-genes-H-pullorum-NCTC13154.tsv")

hbil_gene_df <- hbil_gene_df[!grepl("^T368_", hbil_gene_df$Gene.Name), , drop = FALSE]
hhep_gene_df <- hhep_gene_df[!grepl("^HH_", hhep_gene_df$Gene.Name), , drop = FALSE]
amuc_gene_df <- amuc_gene_df[!grepl("^AMUC_", amuc_gene_df$Gene.Name), , drop = FALSE]
hpul_gene_df <- hpul_gene_df[!grepl("^EL247_", hpul_gene_df$Gene.Name), , drop = FALSE]

gene_combi <- data.frame(
    row.names = union(
        hbil_gene_df$Gene.Name,
        union(
            hhep_gene_df$Gene.Name,
            union(amuc_gene_df$Gene.Name, hpul_gene_df$Gene.Name)
        )
    )
)
gene_combi$A <- row.names(gene_combi) %in% hbil_gene_df$Gene.Name
gene_combi$B <- row.names(gene_combi) %in% hhep_gene_df$Gene.Name
gene_combi$C <- row.names(gene_combi) %in% amuc_gene_df$Gene.Name
gene_combi$D <- row.names(gene_combi) %in% hpul_gene_df$Gene.Name

plot(
    euler(gene_combi, shape = "ellipse"),
    main = "Genes",
    quantities = list(type = "counts", cex = 1.5),
    labels = list(
        labels = c("H. bilis", "H. hepaticus", "A. muciniphila", "H. pullorum"),
        cex = 1.6
    ),
    fills = c("plum3", "indianred1", "gold", "lightsalmon"),
    adjust_labels = TRUE
)

plot(
    venn(gene_combi),
    main = "Genes",
    quantities = list(type = "counts", cex = 1.5),
    labels = list(
        labels = c("H. bilis", "H. hepaticus", "A. muciniphila", "H. pullorum"),
        cex = 1.6
    ),
    fills = c("plum3", "indianred1", "gold", "lightsalmon"),
    adjust_labels = TRUE
)

The Euler plot is missing the overlap ara where 3 genes are in H. bil + H. hep + A. muc but not in H. pul? But the Venn shows it.

4w_euler 4w_venn

Also a second kind of worrying issue, is when I draw the Euler again between different sessions or the same session, it's not deterministic, it seems to miss different A. muc overlaps. Here's when I ran the same code again look now it's missing two areas, the same 3 gene one and the 2 gene only A. muc area

4w_euler

-- Reply to this email directly or view it on GitHub: https://github.com/jolars/eulerr/issues/102 You are receiving this because you are subscribed to this thread.

Message ID: @.***>

hermidalc commented 1 year ago

I thought with Euler diagrams up to 5-way it can represent all relationships using ellipses. I have a 4-way where eulerr is missing an overlap? No, that's not the case. There's no guarantee that a 4-way relationship can be represented by ellipses. Maybe you're thinking of Venn diagrams?

Thank you I must’ve misread it somewhere. So will close this issue