joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
582 stars 187 forks source link

Importing Qiime2 biom file #821

Open sbudree opened 7 years ago

sbudree commented 7 years ago

Hi,

I have created a feature table using Qiime2 and have exported this as a biom file. However, this biom file cannot be imported into phyloseq [Error in colnames<-(*tmp*, value = c("ta1", "ta0")) : length of 'dimnames' [2] not equal to array extent In addition: There were 50 or more warnings (use warnings() to see the first 50)]

Can you advise me on how to import the Qiime2 biom file into Phloseq?

joey711 commented 7 years ago

Can you share the file?

sbudree commented 7 years ago

Hi,

File attached.

-Shrish

From: "Paul J. McMurdie" notifications@github.com Reply-To: joey711/phyloseq reply@reply.github.com Date: Saturday, September 16, 2017 at 7:19 PM To: joey711/phyloseq phyloseq@noreply.github.com Cc: sbudree dr.s.budree@gmail.com, Author author@noreply.github.com Subject: Re: [joey711/phyloseq] Importing Qiime2 biom file (#821)

Can you share the file?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

joey711 commented 7 years ago

Actually, can you post a link? Although you've responded via email, this is actually still on the phyloseq issues tracker:

https://github.com/joey711/phyloseq/issues/821

so email attachments don't work. Any file-hosting site will do. Some (like dropbox) make it easy to stop sharing the file after a certain amount of time, or when you say so.

sbudree commented 7 years ago

Hi,

Link to biom file: https://www.dropbox.com/s/16qsa311jn2bjkb/feature-table.biom?dl=0

ViridianaAvila commented 7 years ago

Hi, Today I had the same error. I realized for some reason biom files generated by QIIME2 does not include the taxonomy. I had to extract the taxonomy and the tree from the qza files and merge them with tsv biom then back to a new biom file. That solved my issue, I hope this information is useful. Here is what I did:

  1. Uncompress the qza files (table, tree and taxonomy). unzip will do.

  2. Enter to the folder of uncompressed table, you will find a feature-table.biom file.

  3. For the easy manipulation of this feature-table.biom convert to .txt using: biom convert -i feature-table.biom -o otu_table.txt --to-tsv

  4. This table can be open in excel or R anything where you can merge by OTU ID the taxonomy information using the taxonomy.tsv file will do. (This taxonomy.tsv file comes from the decompression of taxonomy.qza)

  5. Once the taxonomy info is merged as a column in the otu_table.txt, convert again this file using the following command: biom convert -i otu_table.txt -o new_otu_table.biom --to-hdf5 --table-type="OTU table" --process-obs-metadata taxonomy

  6. Loads in R biom_otu <- import_biom(BIOMfilename = "new_biom.biom", treefilename = "tree.nwk") (The tree file is coming from decompressing unrooted-tree.qza or rooted-tree.qza)

ksdiaz commented 7 years ago

Hello,

I've been having this similar issue as well. I can't use the biom convert circumvention because biom throws an error when trying to do Step 5 ViridianaAvila suggested ("TypeError: can only join an iterable", I suspect this has to do with the qiime format itself, as trying to validate the original biom file from the qiime qza returns it as an invalid file). So, I tried adding the taxonomy as a new column into the converted text file. Trying to load this into phyloseq like this:

otufile <- "feature-table-taxonomy.txt"
mapfile <- "phylGastrotricha_mapping.tsv"
treefile <- "phylGastro_tree.nwk"
qiimetable <- import_qiime(otufile, mapfile, treefile, parseFunction = parse_taxonomy_qiime)

gives me the following error:

Processing map file...
Processing otu/tax file...
Reading file into memory prior to parsing...
Detecting first header line...
Header is on line 2  
Converting input file to a table...
Defining OTU table... 
Adding new column 'Consensus Lineage' then assigning NULL (deleting it).Adding new column
 '#OTU ID' then assigning NULL (deleting it).Parsing taxonomy table...
Error in taxlist[[i]] : subscript out of bounds

Traceback in Rstudio shows me this:

Error in taxlist[[i]] : subscript out of bounds 
3. build_tax_table(taxlist) 
2. import_qiime_otu_tax(otufilename, parseFunction, verbose = verbose) 
1. import_qiime(otufile, mapfile, treefile, parseFunction = parse_taxonomy_qiime)

I've attached the table I was trying to import.

feature-table-taxonomy.txt

jme6f4 commented 7 years ago

Hi, Just came across this and had the same problem. I'm able to run the "moving pictures" tutorial fine (otu_table_mc2_w_tax_no_pynast_failures.biom), but not my own data. The only difference that I can spot is that my .biom file was produced by QIIME2.

jackmen commented 6 years ago

Hi guys,

the approach of @ViridianaAvila works and is the one I have been using quite often in the past. Be aware that you have to give a header to the taxonomy column in your otu_table.txt and this header should be the same name as used behind the command "obs-metadata". @ksdiaz : Your taxonomy has no column header. Name the column header for the taxonomy column in the otu_table.txt "taxonomy" when using this command:

biom convert -i otu_table.txt -o new_otu_table.biom --to-hdf5 --table-type="OTU table" --process-obs-metadata taxonomy

@jme6f4 : You might want to check if you did the same mistake.

Then in phyloseq simply do:

biom <-import_biom ("new_otu_table.biom", parseFunction = parse_taxonomy_greengenes) map <-import_qiime_sample_data ('mapping_file.txt') tree <- read_tree_greengenes ("tree.nwk") class <- merge_phyloseq (biom, map, tree)

Also: The qiime2 taxonomy column header in taxonomy.tsv is "Taxon" and if only pasted to otu_table.txt is not recognized. I guess changing to "obs-metadata Taxon" should also work fine.

Good luck!

HRRTPH commented 6 years ago

Hi everyone, I am trying to import Qiime2 output files into Phyloseq. My commands are here: otufile = system.file("extdata", "feature-table.biom", package="phyloseq") mapfile = system.file("extdata", "G3_metadata.txt", package="phyloseq") trefile = system.file("extdata", "GP_tree_rand_short.newick.gz", package="phyloseq") rs_file = system.file("extdata", "dna-sequences.fasta", package="phyloseq") qiimedata = import_qiime(otufile, mapfile, trefile, rs_file)

I used qiime tools export to get those files except the mapfile which extension was changed from tsv to txt. After running qiimedata, i got an error:

Processing map file... Error in read.table(file = mapfilename, header = TRUE, sep = "\t", comment.char = "") :
no lines available in input In addition: Warning message: In file(file, "rt") : file("") only supports open = "w+" and open = "w+b": using the former

Does anyone know this error?

Please help!!!

Thank you very much,

Toan

Biancabrown commented 6 years ago

Hello, I'm currently having the same issue. When I try to convert the txt file that I generated with the taxonomy header to a biom file I get the following error message "ValueError: could not convert string to float: NA" Any ideas?

Nourhanelsahly commented 6 years ago

hello, I have the same problem (importing biom to phyloseq), I followed up the steps in @ViridianaAvila and @jackmen comments, it went fine until converting the merged file (otu and taxonomy) to biom format again. The new biom file didn't contain the taxonomy column (I converted it to txt to see). So, its not imported to phyloseq.

Do you have any clue please? I am attaching the merged txt file

output.txt

Ajsnevets commented 6 years ago

Hi not sure if you have sorted this out, however, I was having similar problems. This may not be your issue, but it was one of mine. In your output.txt file you have quotation marks for the boundaries of each cell, and the convert function can't recognise these. Instead of writing a .txt file in R I wrote it as .csv and used sep="\t" to get rid of the quotation marks. I then opened in excel and checked the headers were aligned and saved it as a .txt file. I then opened it in notepad and pasted #Constructed from BIOM file# at the start of the txt file as this was inserted by the software after the first convert and figured it might be important. At the moment you have "#OTU ID" and when you remove the brackets the # will probably stop it reading anything after it. Mine starts with this: #Constructed from BIOM file# OTU ID Sample1 Sample2 and so on. Once I had this format sorted everything else worked fine, good luck.

brookeweigel commented 6 years ago

Hello! I am also having problems getting my QIIME2 data into phyloseq. So far, I tried all of the above steps, including making a new .biom file with OTU abundances + taxonomy. I had the same problem as @Nourhanelsahly, even after following the above changes from @Ajsnevets by changing the header to match the OTUID for each file. Please help! I would love to use phyloseq, and since QIIME2 is now widely used, I wish that it was a lot easier to transition from QIIME2 output to phyloseq. I really want to figure this out!

See .txt file below: table.from_biom.txt

See code: biom add-metadata -i core-metrics-results/rarified-feature-table/feature-table.biom -o table-with-taxonomy.biom --observation-metadata-fp core-metrics-results/rarified-feature-table/taxonomy.tsv --sc-separated taxonomy

Apart from the issue of getting a .biom file with both taxonomy and OTU abundances, I am having trouble importing my data into phyloseq. See below. otufile = system.file("extdata", "table-with-taxonomy.biom", package="phyloseq") mapfile = system.file("extdata", "Sea_Cucumber_metadata.txt", package="phyloseq") trefile = system.file("extdata", "tree.nwk", package="phyloseq") qiimedata = import_qiime(otufile, mapfile, trefile)

Here is my mapping file: Sea_Cucumber_metadata.txt

I get this error:

Processing map file... Error in read.table(file = mapfilename, header = TRUE, sep = "\t", comment.char = "") : no lines available in input In addition: Warning message: In file(file, "rt") : file("") only supports open = "w+" and open = "w+b": using the former

Ajsnevets commented 6 years ago

There is no taxonomy data in this table, so I guess you didn't manage to merge them? The first line of the table.from_biom.txt reads #[space]constructed from..[space]#[no space]OTU ID. This is one of several problems but the current placement of your [spaces] is going to stop it reading correctly, it has to be: #[no space]Constructed from BIOM file[no space]#[space]OTU ID Sample1 Sample2

Additionally though, the problem is that with your "table.from_biom.txt" file, the columns names are not lined up with the columns. Not sure how this got messed up.

(side note) Your tables seem to have been rarefied, I am no expert but my understanding is that DESEq2 allows you to look at the data without rarefying it like with qiime2.

I would go back to the original table.qza file, unzip your table.qza file and go through the folders until you found the .biom and then qiime2 convert it with: biom convert -i feature-table.biom -o otu_table.txt --to-tsv open the table in R where you can merge by OTU ID the taxonomy information using the taxonomy.tsv file. (This taxonomy.tsv file comes from the decompression of taxonomy.qza). If you can not open it in R (read.table(file = 'otu.tsv', sep = '\t', header = TRUE) then it is probably because of the #constructed from...# problem so open this in notepad, fix it and then open it in R

brookeweigel commented 6 years ago

Hi @Ajsnevets and everyone else trying to get QIIME2 data into phyloseq... after some exporting and merging in R (to get around the fact that after filtering out chloroplasts and mitochondria, my taxonomy files and OTU matrix have a different # of taxa), I was finally able to wrangle my QIIME2 data into Phyloseq without using any .biom files. Here is a pipeline that I wrote (see PDF below). It is pretty clunky and uses QIIME2 + R + excel, but at least it works!

It is also partially based on the answer from https://forum.qiime2.org/t/converting-biom-files-with-taxonomic-info-for-import-in-r-with-phyloseq/2542/5 from that doesn't use any .biom files.

QIIME2_to_Phyloseq.pdf

laylaeb commented 6 years ago

@brookeweigel MANY thanks...you made my day! its working perfectly :)

kspeeriful commented 6 years ago

As an update to helpful instructions posted by @brookeweigel, you can shorten these steps by formatting the taxonomy file in R using the below commands:

Read in the .tsv version of the feature table, which should now have a column header "OTUID", not "#OTU ID"

features <- read.table(file="feature-table.txt", header=TRUE) head(features)

Read in the .tsv version of the taxonomy table, which should also have a column header "OTUID", not "Feature ID"

tax <- read.table(file="taxonomy.tsv", sep='\t', header=TRUE) head(tax)

Create a list of OTUIDs that are present in tax, but not in features, which need to be eliminated and remove them

The remaining table should have the same number of rows as in the features data frame

tax_filtered <- tax[tax$OTUID %in% features$OTUID,] head(tax_filtered)

Separate the "Taxon" column in the tax_filtered data frame by semicolon so that each step of the taxonomy (e.g., kingdom, phylum, class, etc.) is its own column

tax_filtered <- separate(tax_filtered, Taxon, c("Kingdom","Phylum","Class","Order", "Family", "Genus","Species"), sep= ";", remove=TRUE)

write one outfile containing the OTUID and taxonomic info

write.csv(tax_filtered, file="taxonomy_phyloseq.csv")

DeniRibicic commented 5 years ago

Hi guys,

I am having a problem while importing QIIME2 biom file into phyloseq.

So, I have my own bash script running QIIME2 analysis from raw reads to assigning taxonomy and exporting files of interest. Lastly taxonomy is added to the biom file after properly changing header of taxonomy.tsv file:

biom add-metadata -i exported/feature-table.biom -o exported/table-with-taxonomy.biom --observation-metadata-fp exported/taxonomy.tsv --sc-separated taxonomy

To make things short, I've been running hundreds and hundreds of samples from different projects this way pain-free. But only this one particular run gives me this "itch" with its biom file. Getting the following error while trying to import_biom:

Error in read_biom(biom_file = BIOMfilename) : 
  Both attempts to read input file:
exported/table-with-taxonomy.biom
either as JSON (BIOM-v1) or HDF5 (BIOM-v2).
Check file path, file name, file itself, then try again.
In addition: Warning message:
In strsplit(msg, "\n") : input string 1 is invalid in this locale

I have double-, triple-, quadruple-checked my path and file names, so that's not the issue...

The additional warning message usually occurs when I am importing biom files, but it doesn't really affect my phyloseq workflow.

So, for some reason, it seems that biom file itself could be corrupt. Anyone has an idea how to inspect the biom to check what might be wrong with it? If someone wants to give it a try as well I am uploading it to the google drive:

https://drive.google.com/open?id=1n828eQmPWjfPpsURh1MH9KRMT0pi2rZl

There are 4 files; 1) original non-exported biom file; 3_table.qza, 2) exported biom file; feature-table.biom, 3) biom file with added taxonomy; table-with-taxonomy.biom and 4) taxonomy.tsv file

ps. when I continue using the .qza file in downstream QIIME2 analysis, it runs without a problem.

Any help would be appreciated, Deni

andreanuzzo commented 5 years ago

Hi everyone,

I am not sure that issue has been solved yet. My current way to move from Qiime2 to Phyloseq still hasn't betrayed me as of version 2019.4, so I am sharing it if somebody might find it useful.

In command line I do as follows:

qiime tools export \
  table.qza \
  --output-dir biom

qiime tools export \
  taxonomy.qza \
  --output-dir biom

qiime tools export \
  rooted-tree-filtered.qza \
  --output-dir biom

#This is necessary because I have biom in python2
source deactivate
source activate qiime1-1.9.1

cd biom

biom convert \
  -i feature-table.biom \
  -o feature-json.biom \
  --table-type="OTU table" \
  --to-json

#This step is necessary for the metadata addition to he biom file
sed -i s/Taxon/taxonomy/ taxonomy.tsv | sed -i s/Feature\ ID/FeatureID/ taxonomy.tsv

biom add-metadata \
  -i feature-json.biom \
  -o feature_w_tax.biom \
  --observation-metadata-fp taxonomy.tsv \
  --observation-header FeatureID,taxonomy,Confidence \
  --sc-separated taxonomy --float-fields Confidence

Once this is done, then the phyloseq object is being built by:

library(phyloseq)
library(tidyverse)

biom_path <- file.path('biom/feature_w_tax.biom')
tree_path <- file.path('biom/tree.nwk')
map_path <- file.path('mapping.txt')
tree <- read_tree(tree_path)

table <- import_biom(BIOMfilename = biom_path,
                      parseFunction = parse_taxonomy_default,   #I use SILVA, so I rename the taxtable afterwards
                      parallel = T)
sample_map <- import_qiime_sample_data(map_path)

phylobj_full <- merge_phyloseq(table, sample_map, tree)

TBH I am planning to abandon Qiime2 and follow the Bioconductor workflow as soon as possible, but I hope this helps anybody who needs it!

Nicheca commented 5 years ago

Hi @Ajsnevets and everyone else trying to get QIIME2 data into phyloseq... after some exporting and merging in R (to get around the fact that after filtering out chloroplasts and mitochondria, my taxonomy files and OTU matrix have a different # of taxa), I was finally able to wrangle my QIIME2 data into Phyloseq without using any .biom files. Here is a pipeline that I wrote (see PDF below). It is pretty clunky and uses QIIME2 + R + excel, but at least it works!

It is also partially based on the answer from https://forum.qiime2.org/t/converting-biom-files-with-taxonomic-info-for-import-in-r-with-phyloseq/2542/5 from that doesn't use any .biom files.

QIIME2_to_Phyloseq.pdf

QIIME2_to_Phyloseq.pdf

Hello @brookeweigel ! Many thanks for sharing your code! I am struggling with the sample_names(TAX) that return as NULL whereas my sample_name(OTU) includes my samples names. I do not know where my mistake could come from. Any idea? Thanks in advance!

spholmes commented 5 years ago

Sometime an error is thrown when there are duplicates in the sample names and the program is not allowed to add rownames that have the same value for two different rows, you only see this if you do it step by step by hand, but this is something we see quite a lot. Susan

On Wed, Jul 31, 2019 at 7:54 AM Nicheca notifications@github.com wrote:

Hi @Ajsnevets https://github.com/Ajsnevets and everyone else trying to get QIIME2 data into phyloseq... after some exporting and merging in R (to get around the fact that after filtering out chloroplasts and mitochondria, my taxonomy files and OTU matrix have a different # of taxa), I was finally able to wrangle my QIIME2 data into Phyloseq without using any .biom files. Here is a pipeline that I wrote (see PDF below). It is pretty clunky and uses QIIME2 + R + excel, but at least it works!

It is also partially based on the answer from https://forum.qiime2.org/t/converting-biom-files-with-taxonomic-info-for-import-in-r-with-phyloseq/2542/5 from that doesn't use any .biom files.

QIIME2_to_Phyloseq.pdf https://github.com/joey711/phyloseq/files/1795523/QIIME2_to_Phyloseq.pdf

QIIME2_to_Phyloseq.pdf

Hello @brookeweigel https://github.com/brookeweigel ! Many thanks for sharing your code! I am struggling with the sample_names(TAX) that return as NULL whereas my sample_name(OTU) includes my samples names. I do not know where my mistake could come from. Any idea? Thanks in advance!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/joey711/phyloseq/issues/821?email_source=notifications&email_token=AAJFZPIECUIL2AKR5P6WGFLQCGRQ7A5CNFSM4D3HTSVKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3HQSIY#issuecomment-516884771, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJFZPJ3Q72DYIYRHO3TFWDQCGRQ7ANCNFSM4D3HTSVA .

-- Susan Holmes John Henry Samter Fellow in Undergraduate Education Professor, Statistics 2017-2018 CASBS Fellow, Sequoia Hall, 390 Serra Mall Stanford, CA 94305 http://www-stat.stanford.edu/~susan/