SQMlite coding error - Githubissues

phlienhart commented 1 year ago

Hello! I am a first time user of SqueezeMeta, and I am currently writing up some code to analyze my data. I am trying to identify the genes involved in methane metabolism and plot the taxonomy and functions that contain those genes. I ran my code on my first SQMobject and it worked. However, I now have a smaller SQMlite object based on long reads that I am using. The same code I used for the original data did not work for the SQMlite. I keep getting an error that says "The first argument must be a SQM object". Any ideas what could be wrong? Here are some screenshots of my code and outputs. Thanks so much!

fpusan commented 1 year ago

The subset methods work only for SQM objects (generated by SqueezeMeta). But see the PDF manual for sqmreads2tables.py for ways on subsetting the results of sqm_longreads.pl projects

mscarbor commented 1 year ago

Hey Fernando -- Sorry if it's obvious, but can you point Peyton and me to the correct PDF Manual you refer to above? We're not finding anything in the SqueezeMetaManual_1.6.2.

fpusan commented 1 year ago

It's not obvious, the documentation could be better. See the -q/--query parameter. sqmreads2tables.py --doc should give you more info on the query syntax

mscarbor commented 1 year ago

Excellent. Thank you! Putting it here for posterity:

Part of the SqueezeMeta distribution. 22/07/2021. (c) Fernando Puente-Sánchez, 2019-2020, CNB-CSIC / 2021 SLU.

Generate tabular outputs from sqm_reads.pl or sqm_longreads.pl results.

USAGE: sqm_reads2tables.py [-h] project_path output_dir [-q "QUERY"] [--trusted-functions] [--ignore-unclassified] [--doc]

OPTIONS: -q/--query: Optional query for filtering your results (see below) --trusted-functions: Include only ORFs with highly trusted KEGG and COG assignments in aggregated functional tables --force-overwrite: Write results even if the output directory already exists --doc: Show this documentation

QUERY SYNTAX:

Please enclose query strings within double brackets.
Queries are combinations of relational operations in the form of
(e.g. "PHYLUM == Bacteroidetes") joined by logical operators (AND, OR).
Values are case-sensitive
Parentheses can be used to group operations together.
The "AND" and "OR" logical operators can't appear together in the same expression. Parentheses must be used to separate them into different expressions. e.g: "GENUS == Escherichia OR GENUS == Prevotella AND FUN CONTAINS iron" would not be valid. Parentheses must be used to write either: "(GENUS == Escherichia OR GENUS == Prevotella)" AND FUN CONTAINS iron" -> reads from either Escherichia or Prevotella which contain annotations related to iron. "GENUS == Escherichia OR (GENUS == Prevotella AND FUN CONTAINS iron)" -> reads splits from Escherichia, and splits of Prevotella which contains annotations related to iron.
Another example query would be: "(PHYLUM == Bacteroidetes OR CLASS IN [Alphaproteobacteria, Gammaproteobacteria]) AND FUN CONTAINS iron"
- This would select all the reads assigned to either the Bacteroidetes phylum or the Alphaproteobacteria or Gammaproteobacteria classes, that also contain the substring "iron" in the functional annotations.
Possible subjects are:
- FUN: search within all the combined databases used for functional annotation.
- FUNH: search within the KEGG BRITE and COG functional hierarchies (e.g. "FUNH CONTAINS Carbohydrate metabolism" will select all the splits containing a gene associated with the broad "Carbohydrate metabolism" category)
- SUPERKINGDOM, PHYLUM, CLASS, ORDER, FAMILY, GENUS, SPECIES: search within the taxonomic annotation at the requested taxonomic rank.

mscarbor commented 1 year ago

Based on the above -- is there any way to subset for a list of KEGG IDs?

fpusan commented 1 year ago

Try something like FUN IN [K0001, K0002,... ]

EorgeKit commented 1 year ago

Hello @fpusan , so how would you advise on visualising in the same way its done with the SQM objects, because that is more convenient and easier, the SQLite description given above is very confusing haha, kindly advise

fpusan commented 1 year ago

For visualizing you can use the same functions as for a SQM object. It's only the subsetting of SQMlite objects which is not supported inside SQMtools and has to be done when running sqmreads2tables.py

mscarbor commented 1 year ago

I was hoping to a get a bit more help. I tested this approach by subsetting functions by running sqmreads2tables.py, which seems to have worked just fine. I then made an SQMLite object, which also seems to have all the info I'd expect. The plot I get when plotting functions, however, is this:

The SQMLite object seems to be formatted correctly, and the abudnance/ name tables look fine.

All I am running in R is: plotFunctions(project, count = 'percent')

Any suggestions? Happy to share anything else.

fpusan commented 1 year ago

Can you share the tables directory with me? I will try to reproduce the error

mscarbor commented 1 year ago

Absoultely. Here you go: PMO.zip

fpusan commented 1 year ago

Can reproduce

fpusan commented 1 year ago

I think that the issue comes from your selected functions having 0 total abundance in some of the samples. This leads to a division by zero when calculating the percentages. This is not possible when running sqm_reads.pl or sqm_longreads.pl, but becomes a thing after subsetting certain taxa or functions while running sqmreads2tables.py.

plotFunctions(project, count = "abund") works, it will give you the raw counts of those functions in your samples.

In any case this percentage would not make a lot of sense, as it would be calculated over the total counts of those selected functions, rather than over the total number of reads from each sample.

To get a proper percentage, you should rather run sqmreads2tables.py without a filtering query, so all the reads are considering. Then you can load this with loadSQMlite and plot the percent abundance of those particular functions with

plotFunctions(project_without_subsetting, count = "percent", fun = c("K16161", "K16157", "K16159", "K16160", "K16158", "K16162"))

jtamames / SqueezeMeta

SQMlite coding error #652