Closed phlienhart closed 1 year ago
The subset methods work only for SQM objects (generated by SqueezeMeta).
But see the PDF manual for sqmreads2tables.py
for ways on subsetting the results of sqm_longreads.pl
projects
Hey Fernando -- Sorry if it's obvious, but can you point Peyton and me to the correct PDF Manual you refer to above? We're not finding anything in the SqueezeMetaManual_1.6.2.
It's not obvious, the documentation could be better. See the -q/--query
parameter.
sqmreads2tables.py --doc
should give you more info on the query syntax
Excellent. Thank you! Putting it here for posterity:
Part of the SqueezeMeta distribution. 22/07/2021. (c) Fernando Puente-Sánchez, 2019-2020, CNB-CSIC / 2021 SLU.
Generate tabular outputs from sqm_reads.pl or sqm_longreads.pl results.
USAGE: sqm_reads2tables.py [-h] project_path output_dir [-q "QUERY"] [--trusted-functions] [--ignore-unclassified] [--doc]
OPTIONS: -q/--query: Optional query for filtering your results (see below) --trusted-functions: Include only ORFs with highly trusted KEGG and COG assignments in aggregated functional tables --force-overwrite: Write results even if the output directory already exists --doc: Show this documentation
QUERY SYNTAX:
Please enclose query strings within double brackets.
Queries are combinations of relational operations in the form of
Values are case-sensitive
Parentheses can be used to group operations together.
The "AND" and "OR" logical operators can't appear together in the same expression. Parentheses must be used to separate them into different expressions. e.g: "GENUS == Escherichia OR GENUS == Prevotella AND FUN CONTAINS iron" would not be valid. Parentheses must be used to write either: "(GENUS == Escherichia OR GENUS == Prevotella)" AND FUN CONTAINS iron" -> reads from either Escherichia or Prevotella which contain annotations related to iron. "GENUS == Escherichia OR (GENUS == Prevotella AND FUN CONTAINS iron)" -> reads splits from Escherichia, and splits of Prevotella which contains annotations related to iron.
Another example query would be: "(PHYLUM == Bacteroidetes OR CLASS IN [Alphaproteobacteria, Gammaproteobacteria]) AND FUN CONTAINS iron"
Possible subjects are:
Based on the above -- is there any way to subset for a list of KEGG IDs?
Try something like FUN IN [K0001, K0002,... ]
Hello @fpusan , so how would you advise on visualising in the same way its done with the SQM objects, because that is more convenient and easier, the SQLite description given above is very confusing haha, kindly advise
For visualizing you can use the same functions as for a SQM object. It's only the subsetting of SQMlite objects which is not supported inside SQMtools and has to be done when running sqmreads2tables.py
I was hoping to a get a bit more help. I tested this approach by subsetting functions by running sqmreads2tables.py, which seems to have worked just fine. I then made an SQMLite object, which also seems to have all the info I'd expect. The plot I get when plotting functions, however, is this:
The SQMLite object seems to be formatted correctly, and the abudnance/ name tables look fine.
All I am running in R is: plotFunctions(project, count = 'percent')
Any suggestions? Happy to share anything else.
Can you share the tables directory with me? I will try to reproduce the error
Can reproduce
I think that the issue comes from your selected functions having 0 total abundance in some of the samples. This leads to a division by zero when calculating the percentages. This is not possible when running sqm_reads.pl
or sqm_longreads.pl
, but becomes a thing after subsetting certain taxa or functions while running sqmreads2tables.py
.
plotFunctions(project, count = "abund")
works, it will give you the raw counts of those functions in your samples.
In any case this percentage would not make a lot of sense, as it would be calculated over the total counts of those selected functions, rather than over the total number of reads from each sample.
To get a proper percentage, you should rather run sqmreads2tables.py
without a filtering query, so all the reads are considering.
Then you can load this with loadSQMlite
and plot the percent abundance of those particular functions with
plotFunctions(project_without_subsetting, count = "percent", fun = c("K16161", "K16157", "K16159", "K16160", "K16158", "K16162"))
Hello! I am a first time user of SqueezeMeta, and I am currently writing up some code to analyze my data. I am trying to identify the genes involved in methane metabolism and plot the taxonomy and functions that contain those genes. I ran my code on my first SQMobject and it worked. However, I now have a smaller SQMlite object based on long reads that I am using. The same code I used for the original data did not work for the SQMlite. I keep getting an error that says "The first argument must be a SQM object". Any ideas what could be wrong? Here are some screenshots of my code and outputs. Thanks so much!