mikemc / phyloseqpp

Phyloseq extensions and functions for tidier analysis of microbiome data
Other
2 stars 0 forks source link

Create an `import_qza()` function for importing QIIME2 zip artifacts #18

Open mikemc opened 5 years ago

mikemc commented 5 years ago

Idea: Suppose qza_file is the path to a .qza file.

  1. Parse the metadata.yaml file in the top directory to see what type of data is contained
  2. Unzip the files in the data/ folder to a temp directory
  3. Read these files in using the correct function based on their extension and/or what we learned in Step 1
  4. Return the object as-is (biom object from the biomformat package; DNAStringSet; data frame;) or convert the file to the corresponding phyloseq object
mikemc commented 5 years ago

For .qza files that contain an OTU table,

tmp <- tempfile()
dir.create(tmp)
# Get the path to the fasta file within the qza object
flist <- unzip(qza_file, list = TRUE) %>% 
    as_tibble
biom_file <- flist %>%
    filter(str_detect(Name, "\\.biom")) %>%
    pull(Name)
# unzip and load just the biom file
unzip(qza_file, files = biom_file, exdir = tmp)
bm <- file.path(tmp, biom_file) %>%
    read_biom()
otu <- biom_data(bm) %>%
    as("matrix") %>%
    otu_table(taxa_are_rows = TRUE)

For .qza files that contain reference sequences,

tmp <- tempfile()
dir.create(tmp)
# Get the path to the fasta file within the qza object
flist <- unzip(qza_file, list = TRUE) %>% 
    as_tibble
fasta_file <- flist %>%
    filter(str_detect(Name, "\\.fasta")) %>%
    pull(Name)
# unzip and load just the fasta file
unzip(qza_file, files = fasta_file, exdir = tmp)
rs <- file.path(tmp, fasta_file) %>%
    Biostrings::readDNAStringSet()

For taxonomy generated in QIIME2 and exported to csv (CHECK)

tax <- path_to_taxonomy.csv %>%
    read_csv(comment = "#") %>%
    select(-Confidence) %>%
    mutate(Taxon = map(Taxon, ~parse_taxonomy_qiime(.) %>% enframe)) %>%
    unnest(Taxon) %>%
    mutate_at("value", ~ifelse(. == "", NA, .)) %>%
    pivot_wider %>%
    tax_table
mikemc commented 4 years ago

Should also have functions that import qiime2-exported tsv files.

For sample metadata, can take the types of the variables from the second line.