jtamames / SqueezeMeta

A complete pipeline for metagenomic analysis
GNU General Public License v3.0
357 stars 78 forks source link

SQMlite coding error #652

Closed phlienhart closed 1 year ago

phlienhart commented 1 year ago

Hello! I am a first time user of SqueezeMeta, and I am currently writing up some code to analyze my data. I am trying to identify the genes involved in methane metabolism and plot the taxonomy and functions that contain those genes. I ran my code on my first SQMobject and it worked. However, I now have a smaller SQMlite object based on long reads that I am using. The same code I used for the original data did not work for the SQMlite. I keep getting an error that says "The first argument must be a SQM object". Any ideas what could be wrong? Here are some screenshots of my code and outputs. Thanks so much!

Screen Shot 2023-03-23 at 12 26 45 PM Screen Shot 2023-03-23 at 12 27 07 PM
fpusan commented 1 year ago

The subset methods work only for SQM objects (generated by SqueezeMeta). But see the PDF manual for sqmreads2tables.py for ways on subsetting the results of sqm_longreads.pl projects

mscarbor commented 1 year ago

Hey Fernando -- Sorry if it's obvious, but can you point Peyton and me to the correct PDF Manual you refer to above? We're not finding anything in the SqueezeMetaManual_1.6.2.

fpusan commented 1 year ago

It's not obvious, the documentation could be better. See the -q/--query parameter. sqmreads2tables.py --doc should give you more info on the query syntax

mscarbor commented 1 year ago

Excellent. Thank you! Putting it here for posterity:

Part of the SqueezeMeta distribution. 22/07/2021. (c) Fernando Puente-Sánchez, 2019-2020, CNB-CSIC / 2021 SLU.

Generate tabular outputs from sqm_reads.pl or sqm_longreads.pl results.

USAGE: sqm_reads2tables.py [-h] project_path output_dir [-q "QUERY"] [--trusted-functions] [--ignore-unclassified] [--doc]

OPTIONS: -q/--query: Optional query for filtering your results (see below) --trusted-functions: Include only ORFs with highly trusted KEGG and COG assignments in aggregated functional tables --force-overwrite: Write results even if the output directory already exists --doc: Show this documentation

QUERY SYNTAX:

mscarbor commented 1 year ago

Based on the above -- is there any way to subset for a list of KEGG IDs?

fpusan commented 1 year ago

Try something like FUN IN [K0001, K0002,... ]

EorgeKit commented 1 year ago

Hello @fpusan , so how would you advise on visualising in the same way its done with the SQM objects, because that is more convenient and easier, the SQLite description given above is very confusing haha, kindly advise

fpusan commented 1 year ago

For visualizing you can use the same functions as for a SQM object. It's only the subsetting of SQMlite objects which is not supported inside SQMtools and has to be done when running sqmreads2tables.py

mscarbor commented 1 year ago

I was hoping to a get a bit more help. I tested this approach by subsetting functions by running sqmreads2tables.py, which seems to have worked just fine. I then made an SQMLite object, which also seems to have all the info I'd expect. The plot I get when plotting functions, however, is this:

Screenshot 2023-04-17 at 11 12 08 AM

The SQMLite object seems to be formatted correctly, and the abudnance/ name tables look fine.

Screenshot 2023-04-17 at 11 31 43 AM

All I am running in R is: plotFunctions(project, count = 'percent')

Any suggestions? Happy to share anything else.

fpusan commented 1 year ago

Can you share the tables directory with me? I will try to reproduce the error

mscarbor commented 1 year ago

Absoultely. Here you go: PMO.zip

fpusan commented 1 year ago

Can reproduce

fpusan commented 1 year ago

I think that the issue comes from your selected functions having 0 total abundance in some of the samples. This leads to a division by zero when calculating the percentages. This is not possible when running sqm_reads.pl or sqm_longreads.pl, but becomes a thing after subsetting certain taxa or functions while running sqmreads2tables.py.

plotFunctions(project, count = "abund") works, it will give you the raw counts of those functions in your samples.

In any case this percentage would not make a lot of sense, as it would be calculated over the total counts of those selected functions, rather than over the total number of reads from each sample.

To get a proper percentage, you should rather run sqmreads2tables.py without a filtering query, so all the reads are considering. Then you can load this with loadSQMlite and plot the percent abundance of those particular functions with

plotFunctions(project_without_subsetting, count = "percent", fun = c("K16161", "K16157", "K16159", "K16160", "K16158", "K16162"))