vmikk / metagMisc

Miscellaneous functions for metagenomic analysis.
MIT License
46 stars 11 forks source link

error in function phyloseq_filter_prevalence #28

Closed J5886 closed 10 months ago

J5886 commented 1 year ago

Hello,

I'm using the function phyloseq_filter_prevalence to filter a phyloseq object with R version 4.2.3 and it's working correctly with me, when I upgraded R version to v. 4.3.1 I got an error "OTU abundance data must have non-zero dimensions" with the same data, R version was switched to the old version 4.2.3 to check again and it worked the command:

physeq3 <- phyloseq_filter_prevalence(physeq.gen, prev.trh = 0.5, abund.trh = 10, threshold_condition = "OR") Error in validObject(.Object) : invalid class “otu_table” object: OTU abundance data must have non-zero dimensions.

thanks for your help in advance

vmikk commented 1 year ago

Hello!

Thank you for reporting the issue. However, I could not reproduce it. E.g., the function works for the GlobalPatterns data:

library(phyloseq)
library(metagMisc)
data(GlobalPatterns)
pf <- phyloseq_filter_prevalence(GlobalPatterns, prev.trh = 0.5, abund.trh = 10, threshold_condition = "OR")
pf

I think that the filtering thresholds you set are too strict for your data, which results in the removal of all OTUs/ASVs (there are no OTUs that occur in 50% of the samples or have an abundance greater than 10 reads). The message OTU abundance data must have non-zero dimensions means that the otu_table function from phyloseq can not construct the object without data.

Although, probably I need to add some meaningful error messages.

HTH, With kind regards, Vladimir

J5886 commented 1 year ago

Hi @vmikk thanks for your reply I got it, but even when the threshold was reduced to 10% "prev.trh = 0.1" and the abundance to 5 reads, I got the same error, with the old version of R I used various thresholds and it worked smoothly. I checked that with other data and found the error again.

Best regards

hkiesewalter commented 1 year ago

Hi, I have the exact same issue as described by J5886 with my own data. That error does not appear when using the GlobalPatterns data, but the number of remaining taxa after filtration is very low and not the same as in your example (see below).

I noticed it after updating R to 4.3.2, in my case, a downgrading of R does not helpanymore.

Screenshot 2023-11-18 at 19 05 14

tlmontgo commented 1 year ago

Hi all,

I have this issue as well if I do a clean install of R with newest versions of Rtools and the updated bioconductor version of everything.

I can use this package function successfully with the exact same phyloseq object if instead I use: R version 4.2.1 with R tools 4.0 with Bioconductor version 3.15

I can't say I know what the 'problem' is with updated R but thought this might help someone that knows better than me pinpoint the issue or at least have a work around with version control.

hkiesewalter commented 1 year ago

Hi all,

I installed the old version of metagMisc (0.0.4) manually under the newest R version (4.3.2), and the function [phyloseq_filter_prevalence] works fine.

So, I guess the error is somehow in the new version of metagMisc (0.5.0)?

tlmontgo commented 1 year ago

Thanks so much for the quick reply.

This fixes it for me as well to install metagMisc version 0.0.4 under the newest R version. So yes I guess seems to be an issue with version 0.5.0.

JH966 commented 11 months ago

how to install 0.0.4 version?

vmikk commented 11 months ago

@J5886, @tlmontgo, @hkiesewalter, and @JH966, Thank you for bringing this issue to my attention, and I apologize for the delay in responding.

To effectively diagnose and resolve the problem, I need to replicate the issue on my end. Could one of you kindly provide a small subset of data where the function fails? This will help me to understand what causes the issue and to work towards a solution. Please ensure that any sensitive information is removed or anonymized before sharing.

JH966 commented 10 months ago

Hi all,

I installed the old version of metagMisc (0.0.4) manually under the newest R version (4.3.2), and the function [phyloseq_filter_prevalence] works fine.

So, I guess the error is somehow in the new version of metagMisc (0.5.0)?

hi, how to install old version? please .. tell me

vmikk commented 10 months ago

To install the older version, you may use

devtools::install_github("vmikk/metagMisc@v.0.0.4")   # v.0.0.4

or

remotes::install_github("vmikk/metagMisc@v.0.0.4")
hkiesewalter commented 10 months ago

@J5886, @tlmontgo, @hkiesewalter, and @JH966, Thank you for bringing this issue to my attention, and I apologize for the delay in responding.

To effectively diagnose and resolve the problem, I need to replicate the issue on my end. Could one of you kindly provide a small subset of data where the function fails? This will help me to understand what causes the issue and to work towards a solution. Please ensure that any sensitive information is removed or anonymized before sharing.

Hi,

I have attached a dataset plus R script, which is causing the error in my data and giving wrong filtration results on the GlobalPattern dataset.

Best regards

R_metagMisc.zip

vmikk commented 10 months ago

@hkiesewalter, thank you for sharing the dataset and R script. It was very helpful for pinpointing the problem!

I've identified and fixed the issue (closed with commit 12a8b2e).
It was related to the recent update of the prevalence function, which now utilizes the data.table package for enhanced performance. However, data.table does not use row names, which is where OTU IDs were previously stored. As a result, subsetting the phylonext object was not functioning as intended.

Please pull the latest commit of the package to incorporate these fixes.

remotes::install_github("vmikk/metagMisc")
JH966 commented 10 months ago

@hkiesewalter, thank you for sharing the dataset and R script. It was very helpful for pinpointing the problem!

I've identified and fixed the issue (closed with commit 12a8b2e). It was related to the recent update of the prevalence function, which now utilizes the data.table package for enhanced performance. However, data.table does not use row names, which is where OTU IDs were previously stored. As a result, subsetting the phylonext object was not functioning as intended.

Please pull the latest commit of the package to incorporate these fixes.

remotes::install_github("vmikk/metagMisc")

thank you!!!!! very very thank you!!!

hkiesewalter commented 10 months ago

@vmikk Thanks a lot for fixing it so quickly 👍 :)