sourmash-bio / sourmash

Quickly search, compare, and analyze genomic and metagenomic data sets.
http://sourmash.readthedocs.io/en/latest/
Other
467 stars 80 forks source link

"upset" visualizations of intersections between genomes #1234

Open ctb opened 3 years ago

ctb commented 3 years ago

this might be of use to people looking to grok genome overlaps etc.

notebook permalink notebook latest, but link may break ;)

ctb commented 3 years ago

here are some pretty pictures:

Screen Shot 2020-11-03 at 8 55 58 AM Screen Shot 2020-11-03 at 8 55 53 AM Screen Shot 2020-11-03 at 8 55 48 AM Screen Shot 2020-11-03 at 8 55 44 AM
taylorreiter commented 2 years ago

I just wanted to make a note that twice in two months, I've used upset plots of a few signatures as an alternative to containment to understand what's happening in my systems.

One of them is here: https://github.com/taylorreiter/2021-paper-metapangenomes/issues/5#issuecomment-1055688521 image

And one of them is here: image

My workflow for making this is:

  1. convert a set of signatures to CSV format, using a script like this: https://github.com/taylorreiter/2021-metapangenome-example/blob/main/scripts/sig_to_csv.py
  2. read the csvs into R
  3. use upset and complexupset to make the plots. Code below
library(readr)
library(dplyr)
library(ComplexUpset)

acc_db <- "GCA_000162535.1-s__Parabacteroides_distasonis"

metabat <- read_csv(paste0("outputs/metabat2_prokka_sigs_all/", acc_db, "_all_kmers.csv"),
                    col_names = c("metabat"))
kmers <- read_csv(paste0("outputs/nbhd_sigs_species_all/", acc_db, "_all_kmers.csv"),
                  col_names = "kmers")
roary <- read_csv(paste0("outputs/roary_sigs_all/", acc_db, "_pan_genome_reference_all_genes.csv"),
                  col_names = "roary")

upset_df <- UpSetR::fromList(list(metabat2 = metabat$metabat,
                                  kmers = kmers$kmers,
                                  roary = roary$roary))
conditions <- c("metabat2", "kmers", "roary")
upset <- upset(upset_df, intersect = conditions)
ctb commented 2 years ago

ref https://github.com/sourmash-bio/sourmash/issues/348

ctb commented 3 months ago

adding the upset command via the betterplot plugin in https://github.com/sourmash-bio/sourmash_plugin_betterplot/pull/35 - it produces figures like this:

10sketches upset