Closed jylee324 closed 1 year ago
Thanks for the question. We used to have a hierarchy when we had Hotpep: dbCAN > hotpep > diamond. Now hotpep is replaced by dbCAN-sub, so I would suggest dbCAN > dbCAN-sub > diamond. So for your first example, I would choose GT2. Your second example is tricky. I would choose CE1 as both dbCAN and dbCAN-sub gave that annotation. But for CBM48, I have to look at the hmmer alignment of dbCAN-sub result. The e-value and coverage thresholds we set as default in the program might be too stringent for short CBM.
Yanbin
From: jylee324 @.> Sent: Tuesday, April 11, 2023 9:52 AM To: linnabrown/run_dbcan @.> Cc: Subscribed @.***> Subject: [linnabrown/run_dbcan] Inquiry regarding the interpretation of 'overview.txt' (Issue #115)
Non-NU Email
Hi,
I have run the run_dbcan on my metagenome data selecting HMMER, dbCAN_sub, and DIAMOND tools and obtained an overview.txt file. I have a few questions about the results.
As I understand from the dbCAN webpage's help section, it is recommended to use results annotated by at least two of the three tools (HMMER, dbCAN_sub, DIAMOND). However, I am confused about some cases.
First, what is the ideal approach when only two tools annotate a gene and they are different? For example, how should I interpret a case where dbCAN_sub annotates GT2, and DIAMOND annotates GT0? (HMMER retunr no annotation to that gene). Is there a specific hierarchy of reliability among the three methods (HMMER, dbCAN_sub, DIAMOND), which I can follow?
Also, I encounterd a more complex case like below.
In this case, all three results are different. What is the ideal way to interpret this? I would like to take only the CE1, which is common in two tools, but is this a valid approach? If there is a better method, please let me know.
I sincerely thank you for developing such an great tool.
Best,
— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/linnabrown/run_dbcan/issues/115__;!!PvXuogZ4sRB2p-tU!FQmkn1qpqZuj6NPkNGCeGnAlR4qKgXDLuUndCzXAr0JTpWvEKNUGSMGR11JseBsnbzNETJcOpLPP_SECScgA4g$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AEXNKZSVV7BFDXNYZEGBV63XAVV3FANCNFSM6AAAAAAW2MO6MM__;!!PvXuogZ4sRB2p-tU!FQmkn1qpqZuj6NPkNGCeGnAlR4qKgXDLuUndCzXAr0JTpWvEKNUGSMGR11JseBsnbzNETJcOpLPP_SF6yr7Yqg$. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Thank you for your response!
My inquiry have been precisely addressed.
I will also provide additional information regarding the second case.
HMM Profile | Profile Length | Gene ID | Gene Length | E Value | Profile Start | Profile End | Gene Start | Gene End | Coverage |
---|---|---|---|---|---|---|---|---|---|
CE1.hmm | 227 | Gene_4 | 409 | 2.9e-55 | 1 | 221 | 159 | 404 | 0.9691629955947136 |
HMM Profile | Profile Length | Gene ID | Gene Length | E Value | Profile Start | Profile End | Gene Start | Gene End | Coverage |
---|---|---|---|---|---|---|---|---|---|
CBM48_e68.hmm|CBM48:192|CE0:190|CE6:12|GH10:3|CE1:2|GH62:1|3.1.1.73:1|3.2.1.55:1|3.2.1.8:1 | 61 | Gene_4 | 409 | 4.5e-20 | 2 | 61 | 40 | 115 | 0.9672131147540983 |
HMM Profile | Profile Length | Gene ID | Gene Length | E Value | Profile Start | Profile End | Gene Start | Gene End | Coverage |
---|---|---|---|---|---|---|---|---|---|
CBM48_e68.hmm | 61 | Gene_4 | 409 | 4.5e-20 | 2 | 61 | 40 | 115 | 0.9672131147540983 |
Please let me know if there is anything unusual.
Hi,
I have run the run_dbcan on my metagenome data selecting HMMER, dbCAN_sub, and DIAMOND tools and obtained an overview.txt file. I have a few questions about the results.
As I understand from the dbCAN webpage's help section, it is recommended to use results annotated by at least two of the three tools (HMMER, dbCAN_sub, DIAMOND). However, I am confused about some cases.
First, what is the ideal approach when only two tools annotate a gene and they are different? For example, how should I interpret a case where dbCAN_sub annotates GT2, and DIAMOND annotates GT0? (HMMER retunr no annotation to that gene). Is there a specific hierarchy of reliability among the three methods (HMMER, dbCAN_sub, DIAMOND), which I can follow?
Also, I encounterd a more complex case like below.
In this case, all three results are different. What is the ideal way to interpret this? I would like to take only the CE1, which is common in two tools, but is this a valid approach? If there is a better method, please let me know.
I sincerely thank you for developing such an great tool.
Best,