joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
569 stars 187 forks source link

How to find highest level of classification for a given OTU #773

Open wurth13 opened 7 years ago

wurth13 commented 7 years ago

I have figured out how to export relative abundance tables according to their taxonomic rank, but am struggling with several "unclassified" which I would prefer to be able to classify at the next highest level. So for genus, I would like to find that same OTU group so that I can report it's family level assignment. Is this possible?

Thank you!!

PratheepaJ commented 7 years ago

Suppose we have a phyloseq object ps.

Let's say that the unclassified genus in the taxonomy table is replaced by NA(missing value.)

We can rename the genus level with the highest known level of taxonomy using the following function.

#   taxonomy table   
tax.tab <- data.frame(tax_table(ps))

ModifyTax <- function(x,ind){
    #   xth row in the dataframe
    #   ind taxonomy level to change
    if(is.na(x[ind])){
        nonNa <- which(!is.na(x[-ind]))
        maxNonNa <- max(nonNa)
        x[ind] <- paste(x[maxNonNa],".",x[ind])
    }else{x[ind] <- x[ind]}
}

#   replace the genus taxonomy with the highest known taxonomy
tax_table(ps)[,6] <- apply(tax.tab,1,ModifyTax,ind=6)

In the above ModifyTax function, we add x[ind] to show that the genus level is unclassified.

jfq3 commented 7 years ago

You can take a look at the way I construct taxonomy tables in RDPutils (GitHub, jfq3/RDPutils). For every rank, not just genus, the taxon name begins with "uncl" or "unclassified" what ever rank the OTU is classified to. This is based on confidence provided by the RDP classifier, utax, or sintax. JQ