joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
582 stars 187 forks source link

How to remove unassigned OTUs by a phyloseq object? #1226

Open Fla1487 opened 5 years ago

Fla1487 commented 5 years ago

Dear All,

I am a newbie with the use of R to analyze 16s sequencing data. Recently, I have sequenced some difficult specimens where eukariotic DNA is almost a problem. Fortunatly, I have obtained a good run, with a high number of reads, but with the first steps of the analysis, by using R, I have noted a lot of unassigned OTUs. Unfortnatly, I have searched in this forum, but i found only the topic #652. This is a goos solution, if we have few OTUs, but I have a higher number and OTUs are not named as "OTUs" but with a more complex name. How to remove from phyloseq object these unassigned OTUs? I tried to remove the first rank level (Kingdom), but when I generated graph to check I had again the unassigned OTUs. I even thought to increase stringency in Qiime2 parameters....

Thank you so much in advanc for your help.

mikemc commented 5 years ago

We will need much more information to try to help. Such as: How did you assign your OTUs / build your OTU table? E.g., with DADA2, or some other method? What are some examples of the OTU names and ta taxonomic assignments for the OTUs you want to filter and the OTUs you want to keep? Rather than just telling us which steps you tried in a few words, please also provide the code that you ran and some example output so we can see what the problem with the result was. If you are as specific as possible and provide all potentially relevant information, it saves us from having to go back and forth with more questions and so makes it much more likely for someone to easily be able to volunteer a solution. (I understand these things are not obvious to a new coder, so this is just meant as some guidance for successfully getting help here and in future)

Fla1487 commented 5 years ago

Dear mikemc, Thank you in advance.... I will try to reply you questions.

To obtain OTUs, or better ASVs, I used Qiime2 with DADA2 method (and Silva as refeerence database). In this phase I could discard some OTUs following the rank classification (but I prefer using R). Regarding the OTUs, here an example of the name: "9f63898d18c70346a8a78e6fcef5d8aa". For those OTUs to discard I did not obtain an assignemnt as "bacteria", but in first rank (Kindom), I obtain "unassigned". I would try to remove all OTUs that have the taxonomic classification "unissigned" at this first level. I have sequenced samples derived from biopsy, so I have several OTUs (around 10%) that do not represent bacteria, but eukariotic contamination. Use of the indicated approach I think is not suitable if I have to remove many OTUs.

Regarding the code, I used: tax_table(phy_obj) <- tax_table(phy_obj)[,2:7] I thought that removing the first rank in the tax_table (where I found bacteria, arche and unassigned) I could discard unassigned OTUs.

Thank you again

mikemc commented 5 years ago

Ok, that clears things up for me quite a bit. Note, using tax_table(phy_obj) <- tax_table(phy_obj)[,2:7] will simply overwrite the tax_table slot of your phy_obj with a version without the Kingdom column, without removing any OTUs. You instead want to use the prune_taxa() or filter_taxa() functions to remove OTUs. Look at the phyloseq tutorials and vignettes for more about filtering samples and taxa, and the help for these functions. In your case, you should be able to use

phy_obj0 <- filter_taxa(phy_obj, Kingdom != "unassigned")

to discard all OTUs that have Kingdom == "unassigned"

Fla1487 commented 5 years ago

Tahnk you so much, I have continued to studied phyloseq tutorial and I have found filter_taxa function, that you have suggested, and the subset_taxa, to apply before prune_taxa. I will try both methods. there are distinctive differences?

Thank you again

Fla1487 commented 5 years ago

I appleied the suggested code obtained: ""Error in match.fun(FUN) : object 'Kingdom' not found""

Fla1487 commented 5 years ago

Just an update:

I tried these scrpits: GP.chl = subset_taxa(phy_obj, Kingdom=="Unassigned") phy_obj1 = prune_taxa(taxa_sums(phy.obj) <=10, GP.chl) Unfotunatly, I did not solve the problem. I was not able to remove from orginal phy_obj the unassigned taxa (GP.chl). I do not understand where I wrong.

But....always with the subset_taxa code I can select "bacterial taxa" and continue to work: I used: phy_obj1 = subset_taxa(phy_obj, Kingdom=="D_0__Bacteria")

is it right? I expected many contaminats due to type of speciments, but I would like understand it is a correct way and why not all codes are suitable.... I hope that this discussion is useful Thank you

mikemc commented 5 years ago

In my earlier response, I incorrectly suggested the filter_taxa() function when I meant subset_taxa(), but I see you've figured that out. It seems like your remaining confusion is over the difference between != (not equal to) and == (equal to) in R. You can do

phy_obj0 = subset_taxa(phy_obj, Kingdom != "unassigned")

to remove all taxa that have Kingdom equal to "unassigned". Note, the match must be exact---this will not remove taxa with Kingdom equal to "Unassigned" (capitalized).

If you only want taxa with Kingdom equal to "D_0__Bacteria", then yes you can do

phy_obj1 = subset_taxa(phy_obj, Kingdom=="D_0__Bacteria")
Fla1487 commented 5 years ago

Dear mikemc, thank yu again. I will try even this code