linnabrown / run_dbcan

Run_dbcan V4, using genomes/metagenomes/proteomes of any assembled organisms (prokaryotes, fungi, plants, animals, viruses) to search for CAZymes.
http://bcb.unl.edu/dbCAN2
GNU General Public License v3.0
138 stars 40 forks source link

Interpretation of output results #127

Closed MonaLiu421 closed 9 months ago

MonaLiu421 commented 1 year ago

Dear linnabrown,

Thank you for developing such a good software. I have encountered some result files that were generated during the use process. However, I am uncertain about the exact meaning and significance of these files. I believe that understanding their purpose would greatly benefit my project and contribute to its overall success. Can you explain to me the meaning of the three files respectively? first one:blastp.out image second one:dbsub.out image third one:sub.prediction.out image

Warm regards, Mona

yinlabniu commented 1 year ago

Thanks for the questions. We used three methods to predict substrates (two for CGC substrate prediction, and one for CAZyme substrate prediction, see our dbCAN3 paper and https://bcb.unl.edu/dbCAN2/help.php).

The first file (blastp.out) is the raw blast tabular output between each protein of a CGC against proteins in PULs (from dbCAN-PUL, again see our paper, approach B). From this file, we made the columns 1-5 in the third file sub.prediction.out (see our paper for how).

The second file (dbsub.out) is the raw output from hmmscan search of each protein against dbCAN-sub HMMdb. It is for CAZyme substrate prediction (approach A, https://bcb.unl.edu/dbCAN2/help.php). From this file and cgc_standard.out, we use a majority rule to predict substrate for CGC (approach C, https://bcb.unl.edu/dbCAN2/help.php). The parsed info is added into the columns 6-7 in the third file sub.prediction.out (see our paper for how).

In summary, the third file sub.prediction.out is the final file for substrate prediction for CGCs, combining results from approaches B and C. The other two files are intermediate files that can be checked if you want to know the detailed data on where the substrates are predicted from.

Yanbin

MonaLiu421 commented 1 year ago

hi, Yanbin.Thank you for your prompt reply.I get it. other questions, can both CAZyme substrate prediction and substrate prediction for CGCs be used for subsequent analysis? and what's the differences between CGC and PUL? Warm regards, Mona

yinlabniu commented 1 year ago

CGCs are computer predicted gene clusters (see our dbCAN2 paper https://academic.oup.com/nar/article/46/W1/W95/4996582), while PULs are CGCs with experimentally verified substrates. PULs are usually co-regulated/co-expressed. Some CGCs might be falsely predicted and will never become PULs.

Not all CAZymes are located in CGCs/PULs, so predicting substrates for CAZymes might be more broadly applicable. However, according to our dbCAN3 paper (table 1: https://academic.oup.com/nar/article/51/D1/D557/6833251), about 50% CAZymes in a bacterial genomes are in CGCs, so predicting substrates for CGCs are also very useful and tend to be more accurate (table 1: https://academic.oup.com/nar/article/51/W1/W115/7147496).

Yanbin


From: LiuMin @.> Sent: Thursday, August 10, 2023 8:32 PM To: linnabrown/run_dbcan @.> Cc: Yanbin Yin @.>; Comment @.> Subject: Re: [linnabrown/run_dbcan] Interpretation of output results (Issue #127)

Non-NU Email


hi, Yanbin.Thank you for your prompt reply.I get it. other questions, can both CAZyme substrate prediction and substrate prediction for CGCs be used for subsequent analysis? and what's the differences between CGC and PUL?

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/linnabrown/run_dbcan/issues/127*issuecomment-1674117250__;Iw!!PvXuogZ4sRB2p-tU!CqfZ1SkMD1d3UNyj1uMI0jvKk3CzmWUUK9kiTyxXWe1TN7hFVAx8_hZzqjFPtYwEGmLC4KWSEUEeSkddEIum2A$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AEXNKZSTRPCTXAILYXXANFLXUWDRHANCNFSM6AAAAAA3LSI2YM__;!!PvXuogZ4sRB2p-tU!CqfZ1SkMD1d3UNyj1uMI0jvKk3CzmWUUK9kiTyxXWe1TN7hFVAx8_hZzqjFPtYwEGmLC4KWSEUEeSkfmrpq8hQ$. You are receiving this because you commented.Message ID: @.***>

MonaLiu421 commented 1 year ago

Thank you for your prompt reply. So the PUL corresponding to the CGC predicted in the third file sub.prediction.out , can i consider it has been verified by experiments?

yinlabniu commented 1 year ago

No, PULs are verified, but CGCs are not. The file provides you the substrate prediction for CGCs based on the PUL match.


From: LiuMin @.> Sent: Thursday, August 10, 2023 8:52 PM To: linnabrown/run_dbcan @.> Cc: Yanbin Yin @.>; Comment @.> Subject: Re: [linnabrown/run_dbcan] Interpretation of output results (Issue #127)

Non-NU Email


Thank you for your prompt reply. So the PUL corresponding to the CGC predicted in the third file sub.prediction.out , can i consider it has been verified by experiments?

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/linnabrown/run_dbcan/issues/127*issuecomment-1674126961__;Iw!!PvXuogZ4sRB2p-tU!F1MhS0OTHAHknIlOP232ysIMH9qtJtjdqrhvXgMKSOeNujW_vtoT5MyRJwSgYf9NyZQ8BGIARPUWRbu2KLdlkQ$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AEXNKZQE2SKFHZKXN554UQLXUWF5TANCNFSM6AAAAAA3LSI2YM__;!!PvXuogZ4sRB2p-tU!F1MhS0OTHAHknIlOP232ysIMH9qtJtjdqrhvXgMKSOeNujW_vtoT5MyRJwSgYf9NyZQ8BGIARPUWRbsDHrsCCA$. You are receiving this because you commented.Message ID: @.***>

MonaLiu421 commented 1 year ago

If I want to know the specific PULs, which file should I go to find it?

yinlabniu commented 1 year ago

https://bcb.unl.edu/dbCAN_PUL/


From: LiuMin @.> Sent: Thursday, August 10, 2023 9:07 PM To: linnabrown/run_dbcan @.> Cc: Yanbin Yin @.>; Comment @.> Subject: Re: [linnabrown/run_dbcan] Interpretation of output results (Issue #127)

Non-NU Email


If I want to know the specific PULs, which file should I go to find it?

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/linnabrown/run_dbcan/issues/127*issuecomment-1674134686__;Iw!!PvXuogZ4sRB2p-tU!HTZ145SR-2skiAz13K2bhSvHBio0TQgUBYUh79wKiyZbZDWxPWHs8U1C3YPr8xpKXQiRWZ6Fixof26yW2u799A$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AEXNKZUX7EWXHOQU26HFFBLXUWHV7ANCNFSM6AAAAAA3LSI2YM__;!!PvXuogZ4sRB2p-tU!HTZ145SR-2skiAz13K2bhSvHBio0TQgUBYUh79wKiyZbZDWxPWHs8U1C3YPr8xpKXQiRWZ6Fixof26wjYgsgXw$. You are receiving this because you commented.Message ID: @.***>

MonaLiu421 commented 1 year ago

ok, thanks a million.