joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
567 stars 187 forks source link

Diff Normalization Method- Upload Back Into Phyloseq Format? #1727

Closed hank00000 closed 4 months ago

hank00000 commented 4 months ago

Hi all,

I am trialing using SRS instead of standard log normalization (package "SRS"). I understand that this can't be done inside a phyloseq object, so what I've done is pull out the tax table and otu table, use those to create a merged dataframe, normalize using this data frame and the SRS package.

However, I'd like to re-input this normalized data into my phyloseq object. I've tried just putting it back in as a OTU table, and naming that "normps". However when I try to put that normalized and relative abundance data back into a new phyloseq object, the abundances do not match the original data frame. For example, Taxa A in the normalized/rel abundance data frame had 2-3% in one unamended site, and 0% in amended sites. But in "normps" when graphing, it shows 0% in unamended sites and 2-3% in OTHER amended sites (not the same one that had higher abundances in unamended originally). Basically, my numbers are getting screwed up somewhere and that's worrying.

I've checked that I have the right amount of samples, OTUs, etc. All seems fine until I graph it and it's weird, or I pull the OTU table and it no longer matches my original.

Is there a simple way to go from SRS to phyloseq? Is something mixing up rows along the way? I'll attach how I'm currently doing it but obviously it's important that my data maintains its accuracy through this. I would just graph the data frame as-is, but I want the metadata attached and a phyloseq object does that easily.

Extract OTU table and taxonomy information

otu_table <- as(otu_table(newps)) tax_table <- as(tax_table(newps))

Convert OTU table to a data frame and transpose it

otu_df <- as.data.frame(t(otu_table)) OTUnames <- rownames(otu_df) otu_df$OTU <- rownames(otu_df) # Add OTU names as a column otu_df <- data.frame(OTU = rownames(otu_df), otu_df, row.names = NULL)

Convert taxonomy table to a data frame

tax_df <- as.data.frame(tax_table) tax_df$OTU <- rownames(tax_df) # Add OTU names as a column rownames(tax_df) <- NULL tax_df$Taxa <- apply(tax_df[, -1], 1, function(x) paste(na.omit(x), collapse = "; ")) tax_df <- tax_df[, c("OTU", "Taxa")]

Merge OTU table and taxa table based on OTU identifiers

merged_df <- merge(tax_df, otu_df, by = "OTU", all = TRUE) taxnames <- merged_df$Taxa

Remove the OTU column from the merged table

merged_df <- merged_df[, -1]

row.names(merged_df) <- taxnames

remove taxa column as its in row names now

merged_df <- merged_df[, -1]

remove failed samples

column_sums <- colSums(merged_df)

Select columns with sum not equal to 0

merged_df <- merged_df[, column_sums != 0]

  SRS.shiny.app(merged_df) # To determine CMin

Obtain normalized count data

    normalized <- SRS(merged_df, 2500, set_seed = TRUE, seed = 1) 
    View(normalized)
    normalized <- as.data.frame(normalized)
    rownames(normalized) <- OTUnames #or taxnames depending on if you want to print it

Divide each count by the total count for the corresponding taxon

Relative_Abundance <- normalized Relative_Abundance[, !zero_counts] <- normalized[, !zero_counts] / total_counts[!zero_counts]

Multiply by the scaling factor (e.g., 100)

scaling_factor <- 100

Scale the relative abundances

Relative_Abundance[,] <- Relative_Abundance[,] * scaling_factor

to re-make this into a phyloseq

this seems like such a weird way to go but it's all I got

switch rows and columns back to how they were

transposed_relabun <- data.frame(t(Relative_Abundance), row.names = colnames(Relative_Abundance))

fix metadata to account for removed samples

normsamples <- colnames(Relative_Abundance) as.list(normsamples) metadata_row_names <- rownames(metadata) as.list(metadata_row_names) remove <- metadata_row_names[!metadata_row_names %in% normsamples]

SRS changed metadata names to B.A instead of B_A? So fixing that

rownames(metadata) rownames(metadata) <- gsub("-", ".", rownames(metadata))

metadata_subset <- metadata[!row.names(metadata) %in% remove, ] View(metadata_subset)

normps <- phyloseq(otu_table(transposed_relabun, taxa_are_rows=FALSE), sample_data(metadata_subset), tax_table(tax_table))

Many thanks.

hank00000 commented 4 months ago

Hi all,

I've created a normalized and relative abundance table and made the row names the OTU sequences. I also have a taxa table with row names as sequences, and the tax info as columns. My metadata has rownames as sampleID and the variables as columns.

I get the following affirmations :

rownames(Relative_Abundance) == rownames(tax_table) [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [20] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [39] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [58] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [77] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [96] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [115] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [134] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [153] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [172] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [191] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [210] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [229] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [248] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [267] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [286] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [305] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [324] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

rownames(metadata_subset) == colnames (Relative_Abundance) [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [20] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [39] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [58] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [77] TRUE TRUE TRUE TRUE TRUE

relnormps <- phyloseq(otu_table(Relative_Abundance), sample_data(metadata_subset), tax_table(tax_table))

But phyloseq says: Error in validObject(.Object) : invalid class “phyloseq” object: Component taxa/OTU names do not match. Taxa indices are critical to analysis. Try taxa_names()

Shouldn't this be what I need to put my normalized and relative abundances back into a phylo form?