Inquiry regarding the interpretation of 'overview.txt'

jylee324 commented 1 year ago

Hi,

I have run the run_dbcan on my metagenome data selecting HMMER, dbCAN_sub, and DIAMOND tools and obtained an overview.txt file. I have a few questions about the results.

As I understand from the dbCAN webpage's help section, it is recommended to use results annotated by at least two of the three tools (HMMER, dbCAN_sub, DIAMOND). However, I am confused about some cases.

First, what is the ideal approach when only two tools annotate a gene and they are different? For example, how should I interpret a case where dbCAN_sub annotates GT2, and DIAMOND annotates GT0? (HMMER retunr no annotation to that gene). Is there a specific hierarchy of reliability among the three methods (HMMER, dbCAN_sub, DIAMOND), which I can follow?

Also, I encounterd a more complex case like below.

HMMER: CE1(159-404)
dbCAN_sub: CBM48_e68+CE1_e22
DIAMOND: CE0

In this case, all three results are different. What is the ideal way to interpret this? I would like to take only the CE1, which is common in two tools, but is this a valid approach? If there is a better method, please let me know.

I sincerely thank you for developing such an great tool.

Best,

yinlabniu commented 1 year ago

Thanks for the question. We used to have a hierarchy when we had Hotpep: dbCAN > hotpep > diamond. Now hotpep is replaced by dbCAN-sub, so I would suggest dbCAN > dbCAN-sub > diamond. So for your first example, I would choose GT2. Your second example is tricky. I would choose CE1 as both dbCAN and dbCAN-sub gave that annotation. But for CBM48, I have to look at the hmmer alignment of dbCAN-sub result. The e-value and coverage thresholds we set as default in the program might be too stringent for short CBM.

Yanbin

From: jylee324 @.> Sent: Tuesday, April 11, 2023 9:52 AM To: linnabrown/run_dbcan @.> Cc: Subscribed @.***> Subject: [linnabrown/run_dbcan] Inquiry regarding the interpretation of 'overview.txt' (Issue #115)

Non-NU Email

Hi,

I have run the run_dbcan on my metagenome data selecting HMMER, dbCAN_sub, and DIAMOND tools and obtained an overview.txt file. I have a few questions about the results.

As I understand from the dbCAN webpage's help section, it is recommended to use results annotated by at least two of the three tools (HMMER, dbCAN_sub, DIAMOND). However, I am confused about some cases.

First, what is the ideal approach when only two tools annotate a gene and they are different? For example, how should I interpret a case where dbCAN_sub annotates GT2, and DIAMOND annotates GT0? (HMMER retunr no annotation to that gene). Is there a specific hierarchy of reliability among the three methods (HMMER, dbCAN_sub, DIAMOND), which I can follow?

Also, I encounterd a more complex case like below.

HMMER: CE1(159-404)
dbCAN_sub: CBM48_e68+CE1_e22
DIAMOND: CE0

In this case, all three results are different. What is the ideal way to interpret this? I would like to take only the CE1, which is common in two tools, but is this a valid approach? If there is a better method, please let me know.

I sincerely thank you for developing such an great tool.

Best,

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/linnabrown/run_dbcan/issues/115__;!!PvXuogZ4sRB2p-tU!FQmkn1qpqZuj6NPkNGCeGnAlR4qKgXDLuUndCzXAr0JTpWvEKNUGSMGR11JseBsnbzNETJcOpLPP_SECScgA4g$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AEXNKZSVV7BFDXNYZEGBV63XAVV3FANCNFSM6AAAAAAW2MO6MM__;!!PvXuogZ4sRB2p-tU!FQmkn1qpqZuj6NPkNGCeGnAlR4qKgXDLuUndCzXAr0JTpWvEKNUGSMGR11JseBsnbzNETJcOpLPP_SF6yr7Yqg$. You are receiving this because you are subscribed to this thread.Message ID: @.***>

jylee324 commented 1 year ago

Thank you for your response!

My inquiry have been precisely addressed.

I will also provide additional information regarding the second case.

output "hmmer.out"

HMM Profile	Profile Length	Gene ID	Gene Length	E Value	Profile Start	Profile End	Gene Start	Gene End	Coverage
CE1.hmm	227	Gene_4	409	2.9e-55	1	221	159	404	0.9691629955947136

output "detemp.out"

HMM Profile	Profile Length	Gene ID	Gene Length	E Value	Profile Start	Profile End	Gene Start	Gene End	Coverage
CBM48_e68.hmm\|CBM48:192\|CE0:190\|CE6:12\|GH10:3\|CE1:2\|GH62:1\|3.1.1.73:1\|3.2.1.55:1\|3.2.1.8:1	61	Gene_4	409	4.5e-20	2	61	40	115	0.9672131147540983

output "diamond.out"

HMM Profile	Profile Length	Gene ID	Gene Length	E Value	Profile Start	Profile End	Gene Start	Gene End	Coverage
CBM48_e68.hmm	61	Gene_4	409	4.5e-20	2	61	40	115	0.9672131147540983

Please let me know if there is anything unusual.

linnabrown / run_dbcan

Inquiry regarding the interpretation of 'overview.txt' #115