Open tizioto opened 5 years ago
The speedyseq
package will help with this, it has a much faster implementation of tax_glom
: https://github.com/mikemc/speedyseq
@mikemc can tell you more.
phyloseq::tax_glom()
gets much slower as the number of taxa increases. In your case, the number of taxa is extremely large (66781 taxa) and thus why it is taking so long. But as @benjjneb said I released an add-on package with a much faster implementation version of tax_glom()
with instructions for use at https://github.com/mikemc/speedyseq. I have never tried it with such a large number of taxa (or samples) before, but it seems to work pretty well: a genus-level tax_glom takes ~5 seconds on my laptop.
library(speedyseq)
ps <- readRDS("ps2.all.rds")
ps
#> phyloseq-class experiment-level object
#> otu_table() OTU Table: [ 66781 taxa and 1420 samples ]
#> sample_data() Sample Data: [ 1420 samples by 20 sample variables ]
#> tax_table() Taxonomy Table: [ 66781 taxa by 8 taxonomic ranks ]
#> phy_tree() Phylogenetic Tree: [ 66781 tips and 66050 internal nodes ]
system.time(ps1 <- tax_glom(ps, "genus"))
#> user system elapsed
#> 4.903 0.492 5.409
The current version of speedyseq (v0.1.0) is archived on Zenodo, making it citable and suitable for use in reproducible workflows.
Dear all, This worked perfectly. Thank you. Best Regards
Dra. Polyana Tizioto NGS Soluções Genômicas
https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail Livre de vírus. www.avast.com https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail. <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
On Thu, Oct 10, 2019 at 12:32 PM Michael McLaren notifications@github.com wrote:
phyloseq::tax_glom() gets much slower as the number of taxa increases. In your case, the number of taxa is extremely large (66781 taxa) and thus why it is taking so long. But as @benjjneb https://github.com/benjjneb said I released an add-on package with a much faster implementation version of tax_glom() with instructions for use at https://github.com/mikemc/speedyseq. I have never tried it with such a large number of taxa (or samples) before, but it seems to work pretty well: a genus-level tax_glom takes ~5 seconds on my laptop.
library(speedyseq) ps <- readRDS("ps2.all.rds") system.time(ps1 <- tax_glom(ps, "genus"))#> user system elapsed #> 4.903 0.492 5.409
The current version of speedyseq (v.0.1.0) is archived on Zenodo https://zenodo.org/badge/latestdoi/179732395, making it citable and suitable for use in reproducible workflows.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/joey711/phyloseq/issues/1245?email_source=notifications&email_token=ADU2WSLOZMLWGPRYI6TL2VLQN5DH5A5CNFSM4I7M4NA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEA4Y5BA#issuecomment-540642948, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADU2WSMGY4EIULURGRYUUJLQN5DH5ANCNFSM4I7M4NAQ .
Dear, I have the phyloseq object https://drive.google.com/file/d/1U-YdB5v3oEjAyUn4V5EXi_L8RJJI6e5n/view?usp=sharing and am trying to run tax_glom, but it is taking too long. It has been running for more than 3 days. Do you have any idea why this is happening? Thank you Best regards