microbiome / miaTime

Artistic License 2.0
5 stars 2 forks source link

Conditionally & persistently rare taxa #83

Open antagomir opened 1 month ago

antagomir commented 1 month ago

In microbiome time series analyses we have the definitions of abundant taxa, conditionally rare taxa, persistently rare taxa and other rare taxa (see e.g. the two refs below).

Conditionally rare taxa are defined for instance as taxa with a maximum relative abundance at least N times higher than their minimum value.

Persistently rare taxa are taxa whose maximum relative abundance never exceeds X times greater than the minimum.

https://doi.org/10.1093%2Ffemsec%2Ffix126 https://doi.org/10.1128/mbio.01371-14

It could be helpful to have a function or example showing how to fetch these taxa sets from microbiome time series.

ginkgozh commented 3 weeks ago

This article categorizes microorganisms based on threshold values:always AT (AAT) with a relative abundance of ≥1% in the dataset; conditionally AT (CAT) with ≥1% relative abundance in some samples, but never <0.01%; always RT (ART) with <0.01% relative abundance in all samples; conditionally RT (CRT) with <1% relative abundance in all initial samples, but <0.01% in some sample; moderate taxa (MT) with between 0.01% and 1% relative abundances in all data; and conditionally abundant and rare taxa (CRAT) whose relative abundances ranged from rare values of <0.01% to abundant values of ≥1%. As also previously described, CRAT, CAT, and AAT were collectively referred to as abundant taxa (AT), while ART and CRT comprised rare taxa (RT).(https://www.sciencedirect.com/science/article/pii/S0013935123028347#sec2).

However,` I am currently facing the same issue and cannot find the R code to classify them, nor do I know how to calculate their diversity indices after classification. If you have already solved this problem, please contact me (email: ginkgozh@163.com). Below is the only information I have obtained, the R code to classify them.

`otu <- read.table("C:/Users/heng/Desktop/micro/BF.txt", header = TRUE, row.names = 1, sep = "\t")

(i)稀有类群(rare taxa,RT),在所有样本中丰度均 ≤0.1% 的 OTU

otu_ART <- otu[apply(otu, 1, function(x) max(x) <= 0.0001), ]

(ii)丰富类群(abundant taxa,AT),在所有样本中丰度均 ≥1% 的 OTU

otu_AAT <- otu[apply(otu, 1, function(x) min(x) >= 0.01), ]

(iii)中间类群(moderate taxa,MT),在所有样本中丰度均 >0.1% 且 <1% 的 OTU

otu_MT <- otu[apply(otu, 1, function(x) min(x) > 0.0001 & max(x) < 0.01), ]

(iv)条件稀有类群(conditionally rare taxa,CRT),在所有样本中丰度均 <1%,且仅在部分样本中丰度 <0.1% 的 OTU

otu_CRT <- otu[apply(otu, 1, function(x) min(x) < 0.0001 & max(x) < 0.01), ] otu_CRT <- otu_CRT[which(! rownames(otu_CRT) %in% rownames(otu_ART)), ] #CRT 和 ART 是没有重叠的,在所有样本中丰度均 ≤0.1% 的 OTU 是不可取的

(v)条件丰富类群(conditionally abundant taxa,CAT),在所有样本中丰度均 >0.1%,且仅在部分样本中丰度 >1% 的 OTU

otu_CAT <- otu[apply(otu, 1, function(x) min(x) > 0.0001 & max(x) > 0.01), ] otu_CAT <- otu_CAT[which(! rownames(otu_CAT) %in% rownames(otu_AAT)), ] #CAT 和 AAT 是没有重叠的,在所有样本中丰度均 ≥1% 的 OTU 是不可取的

(vi)条件稀有或丰富类群(conditionally rare or abundant taxa,CRAT),丰度跨越从稀有(最低丰度 ≤0.1%)到丰富(最高丰度 ≥1%)的 OTU

otu_CRAT <- otu[apply(otu, 1, function(x) min(x) <= 0.0001 & max(x) >= 0.01), ]

备注:这 6 个类群没有重叠,总数即等于 OTU 表的总数,相对丰度总和 100%

otu[which(rownames(otu) %in% rownames(otu_ART)),'taxa'] <- 'ART' otu[which(rownames(otu) %in% rownames(otu_AAT)),'taxa'] <- 'AAT' otu[which(rownames(otu) %in% rownames(otu_MT)),'taxa'] <- 'MT' otu[which(rownames(otu) %in% rownames(otu_CRT)),'taxa'] <- 'CRT' otu[which(rownames(otu) %in% rownames(otu_CAT)),'taxa'] <- 'CAT' otu[which(rownames(otu) %in% rownames(otu_CRAT)),'taxa'] <- 'CRAT'

library(openxlsx) write.xlsx(otu_stat, file = "CK_otu_stat.xlsx",rowNames = TRUE)

for (i in 1:(ncol(otu)-1)) otu[[i]] <- ifelse(as.character(otu[[i]]) == '0', NA, otu[[ncol(otu)]]) otu_stat <- data.frame(apply(otu[-ncol(otu)], 2, table)) otu_stat$taxa <- rownames(otu_stat) otu_stat <- reshape2::melt(otu_stat, id = 'taxa')

library(ggplot2) ggplot(otu_stat, aes(variable, value, fill = taxa)) + geom_col(position = 'fill', width = 0.6) + theme(panel.grid = element_blank(), panel.background = element_rect(color = 'gray', fill = 'transparent')) + scale_y_continuous(expand = c(0, 0)) + labs(x = '样本', y = '不同丰富或稀有类群的占比')`