Open tpellegrinetti opened 5 days ago
The *.sum.cand file lists all the potential candidates for each step in each pathway, along with a score (2 for high confidence, 1 for medium confidence, 0 for low confidence).
The *.sum.steps file lists all the steps in each pathway, along with the best candidate, its score (if there is a candidate), and whether or not this step is on the best path.
The *.sum.rules file lists the number of high, medium, or low-confidence steps for each rule. Usually I only look at the rows with rule="all" (meaning, the totals for that entire biosynthetic pathway).
When analyzing many genomes, I usually focus on the rule="all" subset of the *.sum.rules file.
Depending on how you set up your run, the orgId or gid (genome id) values in that table may be hash-based strings, and the orgs.org table may explain what they mean.
Message ID: @.***>
Hi, thanks for creating this amazing tool!
I have 90 nearly complete MAGs, and I'm looking to identify amino acid auxotrophy using GapMind. Since the web version is challenging to use with so many genomes, I'm using the command-line version.
Following the tutorial, I noticed it generates several tables:
aa.hits
aa.revhits
aa.sum.cand
aa.sum.rules
aa.sum.steps
orgs.faa
orgs.org
I’m finding this output a bit confusing. Could you clarify:
Which table should I check to identify candidates with high, medium, and low confidence?
What does each table represent?
Thanks in advance!