jtamames / SqueezeMeta

A complete pipeline for metagenomic analysis
GNU General Public License v3.0
348 stars 81 forks source link

SQMLite from functional subset with sqmreads2tables.py - All Taxa plotted as "Other" #700

Closed mscarbor closed 12 months ago

mscarbor commented 1 year ago

Hello! I did some subsetting for specific KO numbers and am trying to create taxa plots, but they always show up as "other" despite being annotated as specific taxa within the SQMLite object. A screenshot of an example is here:

Screenshot 2023-06-22 at 3 48 01 PM

Peyton_Landfill_AMO.RData.zip

Any suggestions on how to get the taxa to plot?

fpusan commented 1 year ago

Will look into It. What exact commands did you use for generating the project, subsetting, loading into SQMtools and plotting? And what SqueezeeMeta/SQMtools version did you use?

fpusan commented 1 year ago

Can reproduce

fpusan commented 1 year ago

Yeah, ok, this is the same problem as in #652, but with plotTaxonomy instead of plotFunction. I should have anticipated this by then.

Namely you have two samples in which your subset rendered no reads at all, so the total counts are zero which results in NaN when trying to calculate percentages.

plotTaxonomy(project, count='abund') will work (displaying raw counts instead of percentages)

If you really want to have the samples as percentages (not sure on how to interpret those, as in #652 these should probably be percentages over the TOTAL number of reads instead) you can do as follows.

ta = project$taxa$phylum$abund # assuming you want the phylum level
total_reads = colSums(ta)
## alternatively if you want to normalize by the total number of reads in the sample (instead of in the subset)
##  you can load the full non-subseted project with loadSQMlite and then go like
# total_reads = colSums(FULLproject$taxa$phylum$abund)
percents = 100 * t( t(ta) / total_reads )
percents = percents[, !is.na(colSums(percents))] # remove the samples with zero total counts
plotBars( mostAbundant(percents) ) # and here's your plot

Anyways I'll leave this open so I don't forget to provide a more permanent solution to this

mscarbor commented 1 year ago

Hey again! Thanks for the quick reply. This is all with v1.6.2

For subsetting from the original output, I used:

image

I then created the SQMLite object on a server and downloaded to my local Mac and loaded it with:

load("Peyton_Landfill_AMO.RData")

And created plot with:

plotTaxonomy(project, rank = 'order')

Would it be helpful to provide all the tables used to make the SQMLite object?

fpusan commented 1 year ago

Heh, we crossed messages! I don't need more info, the code above should fix it for you

mscarbor commented 1 year ago

AAHHHHHH! Perfect, thanks so much!!