jtamames / SqueezeMeta

A complete pipeline for metagenomic analysis
GNU General Public License v3.0
346 stars 81 forks source link

Understanding KEGG pathways output #843

Open da-shanmugapriya opened 1 month ago

da-shanmugapriya commented 1 month ago

Hi Team,

I want to understand the output of the 20.. kegg.pathways file. In the squeezemeta manual, it's mentioned that column 3 indicates the number of KEGG pathways found, and columns 4 and beyond indicate whether the pathway was predicted. If a pathway was predicted to be present, a number shows the count of enzymes found in that pathway.

However, when I try to sum the number of predicted KEGG pathways in the attached file, it doesn't match the expected value. For example, concoct.22.fa.contigs has 6 KEGG pathways but it is filled with NF. Can you shed some light on this?

Also, what do the values in the second row mean?

kegg_pathway_issue.zip

da-shanmugapriya commented 1 month ago

Hello Team, I went through the SqueezeMeta manual, but I would greatly appreciate further insight from you to better understand the output.

Thanks Shanmugapriya

jtamames commented 1 month ago

Hello I see, let me take a look at this, indeed there are no reported pathways. Best,

jtamames commented 1 month ago

Hello I revised the code for this script. The "pathways found" column indicates the number of pathways for which at least one gene was found. But in order to be listed as "present" (not NF) in the pathway list, it is required that the pathway has found a minimum number of genes (by default set to 5), and a minimum ratio of the genes in the pathway (by default, 10%).

I see that both criteria are inconsistent and will fix that in upcoming versions.

In addition, the parameter for specifying the ratio of genes in the pathway can be set in the file parameters.pl ($minfraction20), but the minimum number of genes cannot be set.

I am attaching here a fix for these problems, containing new 20.minpath.pl and parameters.pl files. Just replace the old scripts in the scripts directory of your SqueezeMeta, and run the step 20 again. If you want to change the parameters, also copy parameters.pl to the project directory.

20fix.tar.gz

Best, J