Closed akhst7 closed 2 years ago
I have the same error.... I am running microbiomeMarker (v 1.0.2) in R 4.1.1
@cpavloud
Can you post a result of following ?
str(sample_data(phyloseq.obj))
Yes, of course
> str(sample_data(physeq)) 'data.frame': 26 obs. of 2 variables: Formal class 'sample_data' [package "phyloseq"] with 4 slots ..@ .Data :List of 2 .. ..$ : chr "kidneys with calcification" "SG_affected" "SG_affected" "SG_affected" ... .. ..$ : chr "Sick" "Sick" "Sick" "Sick" ... ..@ names : chr "Condition" "Health_state" ..@ row.names: chr "ASB1_LL12_1" "KOK1_LL1_1" "KOK10_LL5_2" "KOK2_LL1_2" ... ..@ .S3Class : chr "data.frame"
The functions runs if I have only group = "Condition"
or only group = "Health_state"
But I get the error when I try to run with group = "Health_state"
and subgroup = "Condition"
@cpavloud ,
I did remember now that there is a bug in the code for run-lefse, right in area of test_rep_wilcoxon
within lefse-utilities.R
, which has not been fixed. Only thing you can do is to make another column of combined two groups in to one ;
Sex Treatment Sex_Treatment
1 M HH HH_M
2 M HH HH_M
3 F HH HH_F
Alternatively, you can start using another package called, MicrobiomeProcess. It will let you do lefse with second group as follows;
mp_diff_analysis(
.data = ps,
.abundance=normalize,
.group= Treatment,
.sec.group = Sex,
It is pretty simple to use but powerful. Overall, it is a better package for a 16S microbiome analysis.
Thank you! Great package! I didn't know about it. And I also found this very nice website: https://yulab-smu.top/MicrobiotaProcessWorkshop/articles/MicrobiotaProcessWorkshop.html
@cpavloud, Yeah, it is an upcoming R package for the Microbiome stuff. The link you had is actually for the older version, though commands listed in this tutorial still work. I like the older version better than the newer one.
I am gonna close this issue, since I don't think the developer will fix this issue any time soon.
@akhst7 @cpavloud Thanks for your interested in microbiomeMarker, and I'm Sorry for the later reply. Could you provide a reproducible example that would help me to fix this issue?
For more details on how to make a great minimal reproducible example, see https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example and https://www.tidyverse.org/help/#reprex.
I cannot use reprex to create a reproducible example as you want it, as it only renders the last line of code in the console.
I am using R version 4.1.1 and tidyverse (1.3.1), phyloseq (1.38.0) and microbiomeMarker (1.0.2)
When I try to run lefse with a group and a subgroup
OTU_lefse <- run_lefse( physeq, group = "Health_state", subgroup = "Condition", taxa_rank = "none", transform = c("identity"), norm = "CPM", kw_cutoff = 0.05, lda_cutoff = 2, bootstrap_n = 30, bootstrap_fraction = 2/3, wilcoxon_cutoff = 0.05, multigrp_strat = FALSE, strict = "0", sample_min = 7, only_same_subgrp = FALSE, curv = FALSE )
I get this error
Error in if ((sx < sy) != dir_cmp || sx == sy) { : missing value where TRUE/FALSE needed
But when I run it with group = "Health_state"
and no subgroup
I get a result
OTU_lefse microbiomeMarker-class inherited from phyloseq-class normalization method: [ CPM ] microbiome marker identity method: [ lefse ] marker_table() Marker Table: [ 8 microbiome markers with 5 variables ] otu_table() OTU Table: [ 460 taxa and 26 samples ] sample_data() Sample Data: [ 26 samples by 2 sample variables ] tax_table() Taxonomy Table: [ 460 taxa by 1 taxonomic ranks ]
Also, when I run it with group = "Condition"
and no subgroup
again I get a result
OTU_lefse microbiomeMarker-class inherited from phyloseq-class normalization method: [ CPM ] microbiome marker identity method: [ lefse ] marker_table() Marker Table: [ 19 microbiome markers with 5 variables ] otu_table() OTU Table: [ 460 taxa and 26 samples ] sample_data() Sample Data: [ 26 samples by 2 sample variables ] tax_table() Taxonomy Table: [ 460 taxa by 1 taxonomic ranks ]
This is what my sample_data look like
sample_data(OTU_lefse) Sample Data: [26 samples by 2 sample variables]: Condition Health_state ASB1_LL12_1 kidneys with calcification Sick KOK1_LL1_1 SG_affected Sick KOK10_LL5_2 SG_affected Sick KOK2_LL1_2 SG_affected Sick KOK3_LL1_3 SG_affected Sick KOK4_LL5_3 SG_affected Sick KOK5_MM2_1 SG_affected Sick KOK6_LL12_2 SG_affected Sick KOK7_LL12_4 SG_affected Sick KOK8_LL12_3 SG_affected Sick ASB10_OO10_4 kidneys with calcification Sick KOK9_LL5_1 SG_affected Sick O_1_OO7_2 healthy Healthy O_2_OO3_3 healthy Healthy O_3_MM9_1 healthy Healthy O_4_MM9_4 healthy Healthy O_5_NN6_1 healthy Healthy O_6_NN6_2 healthy Healthy O_8_MM9_3 healthy Healthy ASB2_NN8_2 kidneys with calcification Sick ASB3_NN8_3 kidneys with calcification Sick ASB4_NN11_2 kidneys with calcification Sick ASB5_OO3_1 kidneys with calcification Sick ASB6_OO7_1 kidneys with calcification Sick ASB7_OO7_3 kidneys with calcification Sick ASB8_OO10_1 kidneys with calcification Sick
@cpavloud Please install the latest development version (1.3.2) from github in which your issue has been fixed.
devtools::install_github("yiluheihei/microbiomeMarker")
Ok! That worked! Thank you!
@cpavloud
Could you do me a big favor ? If you have a chance, could you run lefse on MicrobiotaProcess and compared to MicrobiomeMaker and post the result somewhere so that I can see them. Thanks.
@akhst7 lefse in microbiomeMarker is a reimplementation of LEfSe. For details of the method, please see the LEfSe paper
This is quite interesting.
The result of mp_diff_analysis
function
new_lefse <- mp_diff_analysis( .data = MPSE, .abundance=normalize, .group= Health_state, .sec.group = Condition, relative = TRUE)
is that There are not significantly discriminative features after internal first test !
Also, as probably expected, the result of the diff_analysis
function
deres <- diff_analysis(obj = physeq, classgroup = "Health_state", subclass = "Condition", mlfun = "lda", filtermod = "pvalue", firstcomfun = "kruskal_test", firstalpha = 0.05, strictmod = TRUE, secondcomfun = "wilcox_test", subclmin = 3, subclwilc = TRUE, secondalpha = 0.01, lda=3)
is that There are not significantly discriminative features after internalwilcox_test !
However, the run_lefse
function identifies 2 microbiome markers
microbiomeMarker-class inherited from phyloseq-class normalization method: [ CPM ] microbiome marker identity method: [ lefse ] marker_table() Marker Table: [ 2 microbiome markers with 5 variables ] otu_table() OTU Table: [ 460 taxa and 26 samples ] sample_data() Sample Data: [ 26 samples by 2 sample variables ] tax_table() Taxonomy Table: [ 460 taxa by 1 taxonomic ranks ]
Object of class "marker_table" feature enrich_group ef_lda pvalue padj marker1 Otu16 Healthy 3.503214 0.04537015 0.04537015 marker2 Otu185 Sick 3.041596 0.02413744 0.02413744
@cpavloud
OK that's what I thought you would get. I get the exactly the same result. Initially I thought it was due to different default normalization approaches used by MP and MB. MP uses "hellinger"(I think, according to the vignette) while MB uses "cpm". But then, when I swapped MP's RelRareAbundance with MB's cpm, and did diff_analysis, I got the same result. I have not done the opposite and have not manually calculate lefse (basically annova followed by LDA) Huttonhower's lefse package yet but it appears that different normalization significantly affects lefse results, particularly data with a borderline diff. abundance. Perhaps, yiluheihei could address which normalization method we should be using for lefse.
@akhst7 As you said, the choice of normalization method has large effect on the result of different analysis, and there is no consensus on which method is better than others. For lesfse, I recommend use the "CPM", the default normalization method in lefse paper. And try other methods if the result is not good enough.
Moreever, microbiomeMarker currently provides a function compare_DA()
which can be used to perform comparison of different methods and help users to choose a suitable normalization/differential method for their own dataset.
library(microbiomeMarker)
data(kostic_crc)
# Remove taxa not seen in at least 20% of the samples
kostic_crc <- phyloseq::filter_taxa(
kostic_crc,
function(x) sum(x > 0) > (0.2*length(x)), TRUE)
compare_res <- compare_DA(
ps = kostic_crc,
group = "DIAGNOSIS",
methods = "lefse",
args = list(lefse = list(list(norm = "CPM"), list(norm = "TSS"))),
BPPARAM = BiocParallel::SnowParam(workers = 2, progressbar = TRUE),
n_rep = 2,
effect_size = 5)
#> | | | 0% | |================== | 25% | |=================================== | 50% | |==================================================== | 75% | |======================================================================| 100%
compare_summary <- summary(compare_res)
#> Warning: Best score is <= 0.
#> You might require to preprocessing your data or re-run with a higher effect size.
compare_summary
#> call
#> 2 run_lefse(ps = "kostic_crc", group = "DIAGNOSIS", taxa_rank = "none", norm = "TSS")
#> 1 run_lefse(ps = "kostic_crc", group = "DIAGNOSIS", taxa_rank = "none", norm = "CPM")
#> auc fpr power fdr score score_0.05 score_0.95
#> 2 0.5000000 0.00000000 0.0000000 0.0000000 0.0000000 0.00 0.0000000
#> 1 0.8755844 0.03832834 0.3333333 0.6388889 -0.5136941 -0.55 -0.4823232
Created on 2022-06-07 by the reprex package (v2.0.1)
The higher the score
in the summary result, the better the method is estimated to be. However, this feature is in beta version, and only taxa_rank = "none"
is allowed.
Hope this helps.
Thanks. It is much clear now.
I run a following phyloseq obj;
I run a following;
I get an error as follows;
Here is a sample_data of ps;
what am I missing here ?
Thanks.