Closed mmpust closed 3 years ago
Hi Marie,
these are valid questions :) Here's how to interpret these logs:
Best Silvio
Dear Silvio, thank you for this very fast and helpful response!
Answer 1: How do I interpret this biologically? If the genes GLY1 and ltaE are detected in a bacterial genome, then the bacterium has the genetic potential to undergo the THREONINE-ALDOLASE-RXN. In this case, the pathway completeness would be 50 %. and not 33 %, is that correct? Also, very often a pathway has just two reactions assigned to and both reactions only differ in their EC number but have the same genes and reaction name. So, gapseq assigns a 50% completeness value if the genome has sequences that are matching with one of the EC entries. However, the pathway is probably 100 % complete if the genes are present and a good Blast hit for one EC number is found? Does this make sense?
Answer 2: This is very clear now, thanks!
Kind regards, Marie
Dear Marie, thanks for your careful evaluation :) In cases of two ECs for one reactions, we interpreted the individual ECs as sub-reactions of the overall reaction, which is why we splitted them in the prediction into two. But in the specific case you showed, it looks more like an instance of two different EC numbers, which catalyse both the same reaction (i.e. same stoichiometry) but with different specificity for the focal substrate. We will recheck how we are handling such cases and my do some adjustments.
Thank you! Great. I just send you one other example (that occurs quite often that way). Maybe this helps in your evaluation process. Here, gapseq assigns a 50% completeness score but all the genes are present for both reactions.
89/1779: Checking for pathway |GLUTAMINDEG-PWY| L-glutamine degradation I with 2 reactions
(Degradation,Amino-Acid-Degradation,Proteinogenic-Amino-Acids-Degradation,GLUTAMINE-DEG)
1) GLUTAMIN-RXN glutaminase 3.5.1.38 GLS Gls2 SNZ1 SNO1 ybaS yneH asnB glsA glsB
Merge sequence data from 3.5.1.38.fasta and GLUTAMIN-RXN.fasta
/tmp/tmp.vXA36kq4Pj (8 sequences)
Blast hit (1x)
bit=241 id=42.667 cov=83 hit=UniRef90_O68897
Candidate reaction for import: 4
2) GLUTAMIN-RXN glutaminase 3.5.1.2 GLS Gls2 SNZ1 SNO1 ybaS yneH asnB glsA glsB
Merge sequence data from 3.5.1.2.fasta and GLUTAMIN-RXN.fasta
/tmp/tmp.VPwJaGFO5h (487 sequences)
check subunits: 3
total subunits found: 0 / 3
NO hit because of missing subunits
Pathway completeness: 1/2 (50%)
Hits with candidate reactions in database: 1/2
Key reactions: 0/0
Kind regards, Marie
Another interesting case:
https://metacyc.org/META/NEW-IMAGE?type=PATHWAY&object=PWY-6966
Here, the two EC-numbers refer to different subunits. Currently, gapseq does not predict the pathway/reaction e.g. for Methylococcus capsulatus Bath (GCF_000008325.1), for which the pathway is expected (#66).
hi @mmpust, sorry it took a bit longer to come up with a fix for this!
I revised the behavior of gapseq so that multiple metacyc EC numbers are treated as alternatives belonging to the same reactions now.
In your example PWY-5436 (L-threonine degradation IV)
, this means that only two reactions are considered and the pathway prediction should work more reasonably!
The new approach should also work for more than two alternative EC numbers (e.g. ANAPHENOXI-PWY).
Hope it works as expected now. Thank you for pointing this out :)
Awesome, thank you!
Hello,
Thank you very much in advance! Marie