lichen-lab / GMPR

GMPR: A robust normalization method for zero-inflated count data with application to microbiome sequencing data
https://github.com/lichen-lab/GMPR
2 stars 2 forks source link

How to deal with the NA values? #1

Open BinhongLiu opened 4 years ago

BinhongLiu commented 4 years ago

Hi, Thank you for developing this useful tool for microbiota data normalization! I'm trying to perform the demo data analysis, but there were some errors when I performed the analysis.

when perform the analysis through library:

require(GUniFrac) require(vegan) require(DESeq2) library(GMPR) data(throat.otu.tab) data(throat.meta) ###########################################################################################################

Calculate GMPR size factor

Row - features, column - samples

otu.tab <- t(throat.otu.tab) gmpr.size.factor <- GMPR(t(otu.tab)) Warning message: In if (!(class(OTUmatrix) %in% c("data.frame", "matrix"))) stop("Unknown datatype of object \"OTUmatrix\".") : 条件的长度大于一,因此只能用其第一元素

when perform the analysis through source the function:

source("C:/Users/Administrator/Linux/Scripts/GMPR-master/GMPR.R") gmpr.size.factor <- GMPR(t(otu.tab)) Begin GMPR size factor calculation ... 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 Completed! Please watch for the samples with limited sharing with other samples based on NSS! They may be outliers! Warning message: In GMPR(t(otu.tab)) : The following samples 4695 2983 2554 3315 879 1313 5661 4125 2115 3309 3225 514 3427 484 2894 5523 652 5160 3349 4526 4925 3202 4716 3015 5046 1477 1873 1583 1203 5047 3428 4399 642 4499 2740 5334 3067 4883 5162 1655 4220 39 2371 3600 378 4864 5291 1651 3574 1635 4472 5456 3026 189 784 1091 742 368 4564 420 501 5522 4306 1026 2981 5414 232 1995 3231 1044 1639 5280 1037 5483 3546 1623 3352 4969 5217 3992 3097 2556 2239 263 1592 5054 4497 1867 1219 1986 4733 4666 4847 5201 2906 1903 571 2660 4697 3302 4831 1981 1796 765 2313 2098 2002 5547 5655 5675 911 991 3699 3353 2743 978 285 4771 1596 4494 2198 4022 1755 2982 323 4608 1278 4877 667 2761 1160 3758 483 2757 3075 2122 3715 4708 502 777 906 3020 3492 1419 351 2729 440 4967 2836 607 3703 4840 1283 2570 5581 4417 65 231 5308 2718 5227 5286 3039 5679 2046 1421 3390 679 2582 4589 5119 2398 955 2263 4354 1448 308 4424 4298 2334 2153 4721 2301 2595 3938 1795 4775 2025 4478 1476 3969 4693 943 4248 1561 5269 467 2481 933 5167 4013 2980 3865 44 [... truncated] otu.tab.norm <- t(t(otu.tab) / gmpr.size.factor) View(otu.tab.norm) head(otu.tab.norm) ESC_1.1_OPL ESC_1.3_OPL ESC_1.4_OPL ESC_1.5_OPL ESC_1.6_OPL ESC_1.10_OPL ESC_1.11_OPL 4695 NA NA NA NA NA NA NA 2983 NA NA NA 0 NA NA NA 2554 0 NA 0 NA NA NA 0 3315 NA NA NA NA NA NA NA 879 NA 0 NA NA NA NA 0 1313 NA NA NA NA NA NA NA ESC_1.12_OPL ESC_1.13_OPL ESC_1.14_OPL ESC_1.15_OPL ESC_1.18_OPL ESC_1.19_OPL ESC_1.20_OPL 4695 NA NA NA NA NA NA NA 2983 0 NA NA NA NA NA NA 2554 NA NA 0 NA NA NA NA 3315 NA NA NA NA NA NA NA 879 NA NA NA NA NA NA NA 1313 NA NA 0 NA NA NA NA ESC_1.21_OPL ESC_1.22_OPL ESC_1.23_OPL ESC_1.24_OPL ESC_1.25_OPL ESC_1.26_OPL ESC_1.27_OPL 4695 0 NA NA NA NA NA 0 2983 NA NA NA NA NA NA 0 2554 NA NA NA NA NA NA NA 3315 NA NA NA NA NA NA NA 879 0 NA NA 0 0 NA NA 1313 NA NA NA 0 NA NA NA ESC_1.28_OPL ESC_1.29_OPL ESC_1.30_OPL ESC_1.31_OPL ESC_1.32_OPL ESC_1.33_OPL ESC_1.34_OPL 4695 NA NA NA NA NA 0 NA 2983 NA NA 0 NA NA NA 0.0000000 2554 NA NA NA NA 0 NA NA 3315 NA NA NA NA NA NA 0.4716809 879 NA NA NA NA NA NA NA 1313 NA NA NA NA 0 NA NA ESC_1.35_OPL ESC_1.36_OPL ESC_1.37_OPL ESC_1.39_OPL ESC_1.40_OPL ESC_1.42_OPL ESC_1.43_OPL 4695 NA NA NA NA 0 NA NA 2983 NA NA NA NA NA 0 NA 2554 NA NA NA NA NA NA NA 3315 NA NA NA 0 NA NA NA 879 NA NA NA NA 0 NA NA 1313 NA NA NA NA NA NA NA ESC_1.44_OPL ESC_1.45_OPL ESC_1.46_OPL ESC_1.47_OPL ESC_1.48_OPL ESC_1.49_OPL ESC_1.50_OPL 4695 NA NA NA NA NA NA NA 2983 NA NA NA NA 0 NA NA 2554 NA NA 0 0 NA NA 1.141567 3315 NA NA NA NA NA 0 0.000000 879 NA NA NA NA NA 0 0.000000 1313 NA 0 NA NA NA NA NA ESC_1.51_OPL ESC_1.52_OPL ESC_1.53_OPL ESC_1.55_OPL ESC_1.56_OPL ESC_1.57_OPL ESC_1.58_OPL 4695 NA NA 8.252633 NA 0 NA NA 2983 NA 0 NA 0 NA NA NA 2554 NA NA NA NA NA 0 0 3315 0 NA NA NA NA NA NA 879 NA 0 NA NA NA NA NA 1313 NA NA 0.000000 NA NA NA NA ESC_1.59_OPL ESC_1.60_OPL ESC_1.61_OPL ESC_1.62_OPL ESC_1.63_OPL ESC_1.64_OPL ESC_1.65_OPL 4695 0 NA NA NA NA 0 NA 2983 NA NA NA NA NA NA 0 2554 0 NA NA NA 2.110897 NA NA 3315 NA 0 NA NA NA NA 0 879 0 NA NA NA 0.000000 NA 0 1313 NA NA NA NA NA NA NA ESC_1.67_OPL ESC_1.68_OPL ESC_1.69_OPL ESC_1.70_OPL 4695 NA NA NA 0 2983 NA NA NA NA 2554 0 NA NA NA 3315 NA NA NA NA 879 NA NA NA NA 1313 NA NA NA NA

What should I deal with these NAs in the normalized table? Thank you! Hongbin liu

teyden commented 3 years ago

My suspicion is that there is a bug in the example, and would love clarification from the authors. (Again thanks for doing this research)

From the example, it shows this:

otu.tab <- t(throat.otu.tab)
gmpr.size.factor <- GMPR(otu.tab)

dim(otu.tab)
[1] 856  60

otu.tab.norm <- t(t(otu.tab) / gmpr.size.factor)

GMPR is computed on the OTU table where in otu.tab (836 OTUs x 60 samples), the rows are OTUs and columns are samples. But based on the documentation, the OTU matrix should have OTUs arranged in columns and samples in rows. When running the further steps, of producing a normalized OTU table following the example, many NA's are produced. And I believe this is because gmpr.size.factor has many NAs in it as the size factors are being computed on the OTUs when it should be computed on samples since library size is a feature of the samples.

Based on @BinhongLiu's comment, it seems you had caught this error and computed gmpr.size.factor <- GMPR(t(otu.tab)) instead which should be accurate, I think. Meaning the downstream computation should ideally work fine. I tried it and it worked for me, without producing any NA's for this example.

An OTU table matrix, where OTUs arranged in columns and samples in rows.