linnabrown / run_dbcan

Run_dbcan V4, using genomes/metagenomes/proteomes of any assembled organisms (prokaryotes, fungi, plants, animals, viruses) to search for CAZymes.
http://bcb.unl.edu/dbCAN2
GNU General Public License v3.0
138 stars 40 forks source link

Inquiry regarding the interpretation of 'overview.txt' #115

Closed jylee324 closed 1 year ago

jylee324 commented 1 year ago

Hi,

I have run the run_dbcan on my metagenome data selecting HMMER, dbCAN_sub, and DIAMOND tools and obtained an overview.txt file. I have a few questions about the results.

As I understand from the dbCAN webpage's help section, it is recommended to use results annotated by at least two of the three tools (HMMER, dbCAN_sub, DIAMOND). However, I am confused about some cases.

First, what is the ideal approach when only two tools annotate a gene and they are different? For example, how should I interpret a case where dbCAN_sub annotates GT2, and DIAMOND annotates GT0? (HMMER retunr no annotation to that gene). Is there a specific hierarchy of reliability among the three methods (HMMER, dbCAN_sub, DIAMOND), which I can follow?

Also, I encounterd a more complex case like below.

In this case, all three results are different. What is the ideal way to interpret this? I would like to take only the CE1, which is common in two tools, but is this a valid approach? If there is a better method, please let me know.

I sincerely thank you for developing such an great tool.

Best,

yinlabniu commented 1 year ago

Thanks for the question. We used to have a hierarchy when we had Hotpep: dbCAN > hotpep > diamond. Now hotpep is replaced by dbCAN-sub, so I would suggest dbCAN > dbCAN-sub > diamond. So for your first example, I would choose GT2. Your second example is tricky. I would choose CE1 as both dbCAN and dbCAN-sub gave that annotation. But for CBM48, I have to look at the hmmer alignment of dbCAN-sub result. The e-value and coverage thresholds we set as default in the program might be too stringent for short CBM.

Yanbin


From: jylee324 @.> Sent: Tuesday, April 11, 2023 9:52 AM To: linnabrown/run_dbcan @.> Cc: Subscribed @.***> Subject: [linnabrown/run_dbcan] Inquiry regarding the interpretation of 'overview.txt' (Issue #115)

Non-NU Email


Hi,

I have run the run_dbcan on my metagenome data selecting HMMER, dbCAN_sub, and DIAMOND tools and obtained an overview.txt file. I have a few questions about the results.

As I understand from the dbCAN webpage's help section, it is recommended to use results annotated by at least two of the three tools (HMMER, dbCAN_sub, DIAMOND). However, I am confused about some cases.

First, what is the ideal approach when only two tools annotate a gene and they are different? For example, how should I interpret a case where dbCAN_sub annotates GT2, and DIAMOND annotates GT0? (HMMER retunr no annotation to that gene). Is there a specific hierarchy of reliability among the three methods (HMMER, dbCAN_sub, DIAMOND), which I can follow?

Also, I encounterd a more complex case like below.

In this case, all three results are different. What is the ideal way to interpret this? I would like to take only the CE1, which is common in two tools, but is this a valid approach? If there is a better method, please let me know.

I sincerely thank you for developing such an great tool.

Best,

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/linnabrown/run_dbcan/issues/115__;!!PvXuogZ4sRB2p-tU!FQmkn1qpqZuj6NPkNGCeGnAlR4qKgXDLuUndCzXAr0JTpWvEKNUGSMGR11JseBsnbzNETJcOpLPP_SECScgA4g$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AEXNKZSVV7BFDXNYZEGBV63XAVV3FANCNFSM6AAAAAAW2MO6MM__;!!PvXuogZ4sRB2p-tU!FQmkn1qpqZuj6NPkNGCeGnAlR4qKgXDLuUndCzXAr0JTpWvEKNUGSMGR11JseBsnbzNETJcOpLPP_SF6yr7Yqg$. You are receiving this because you are subscribed to this thread.Message ID: @.***>

jylee324 commented 1 year ago

Thank you for your response!

My inquiry have been precisely addressed.

I will also provide additional information regarding the second case.

  1. output "hmmer.out"
HMM Profile Profile Length Gene ID Gene Length E Value Profile Start Profile End Gene Start Gene End Coverage
CE1.hmm 227 Gene_4 409 2.9e-55 1 221 159 404 0.9691629955947136
  1. output "detemp.out"
HMM Profile Profile Length Gene ID Gene Length E Value Profile Start Profile End Gene Start Gene End Coverage
CBM48_e68.hmm|CBM48:192|CE0:190|CE6:12|GH10:3|CE1:2|GH62:1|3.1.1.73:1|3.2.1.55:1|3.2.1.8:1 61 Gene_4 409 4.5e-20 2 61 40 115 0.9672131147540983
  1. output "diamond.out"
HMM Profile Profile Length Gene ID Gene Length E Value Profile Start Profile End Gene Start Gene End Coverage
CBM48_e68.hmm 61 Gene_4 409 4.5e-20 2 61 40 115 0.9672131147540983

Please let me know if there is anything unusual.