jtamames / SqueezeMeta

A complete pipeline for metagenomic analysis
GNU General Public License v3.0
372 stars 80 forks source link

The "tax" parameter of the "subsetTax" function could accept list of taxons #749

Open glenjasper opened 10 months ago

glenjasper commented 10 months ago

Hi,

I had the need to make a subset (subsetTax function) of several taxa (phyla), but the tax parameter doesn't accept lists. So I had to do a subset by taxon, it took a while but it was possible (it would take longer for a larger list). However it might be useful for the tax parameter to accept lists like the plotTaxonomy function.

Subset of a list of taxa (ideal)

my_project_fungi = subsetTax(my_project, rank = 'phylum', tax = c('Ascomycota', 'Basidiomycota', 'Blastocladiomycota', 'Chytridiomycota', 'Cryptomycota', 'Mucoromycota', 'Olpidiomycota', 'Zoopagomycota'), rescale_copy_number = F)

Subset by taxon (the way I did it)

subset_fungi_ascomycota = subsetTax(my_project, rank = 'phylum', tax = 'Ascomycota', rescale_copy_number = F) subset_fungi_basidiomycota = subsetTax(my_project, rank = 'phylum', tax = 'Basidiomycota', rescale_copy_number = F) subset_fungi_blastocladiomycota = subsetTax(my_project, rank = 'phylum', tax = 'Blastocladiomycota', rescale_copy_number = F) subset_fungi_chytridiomycota = subsetTax(my_project, rank = 'phylum', tax = 'Chytridiomycota', rescale_copy_number = F) subset_fungi_cryptomycota = subsetTax(my_project, rank = 'phylum', tax = 'Cryptomycota', rescale_copy_number = F) subset_fungi_mucoromycota = subsetTax(my_project, rank = 'phylum', tax = 'Mucoromycota', rescale_copy_number = F) subset_fungi_zoopagomycota = subsetTax(my_project, rank = 'phylum', tax = 'Zoopagomycota', rescale_copy_number = F)

my_project_fungi = combineSQM(subset_fungi_ascomycota, subset_fungi_basidiomycota, subset_fungi_blastocladiomycota, subset_fungi_chytridiomycota, subset_fungi_cryptomycota, subset_fungi_mucoromycota, subset_fungi_zoopagomycota, tax_source = 'contigs', rescale_copy_number = F)

Best, Glen

fpusan commented 10 months ago

Can be done already with a bit of kung fu

subsetTaxMulti = function (SQM, rank, taxa,
                           trusted_functions_only = FALSE,
                           ignore_unclassified_functions = FALSE, 
                           rescale_tpm = TRUE,
                           rescale_copy_number = TRUE)
    {
    subs = lapply(taxa, FUN=function(tax) subsetTax(SQM, rank, tax, 
                                                    trusted_functions_only,
                                                    ignore_unclassified_functions,
                                                    rescale_tpm,
                                                    rescale_copy_number)
                 )
    return(combineSQM(subs, tax_source = "contigs",
                            trusted_functions_only = trusted_functions_only,
                            ignore_unclassified_functions = ignore_unclassified_functions,
                            rescale_tpm = rescale_tpm,
                            rescale_copy_number = rescale_copy_number)
          )
    }

Usage example would be subsetTaxMulti(SQM, "phylum", c("Proteobacteria", "Bacteroidetes"))

Anyways I will see about upgrading the original function to behave like this too.

glenjasper commented 10 months ago

How fantastic! Thank you very much Fernando! =D

fpusan commented 10 months ago

Reopening until I actually implement it