linnabrown / run_dbcan

Run_dbcan V4, using genomes/metagenomes/proteomes of any assembled organisms (prokaryotes, fungi, plants, animals, viruses) to search for CAZymes.
http://bcb.unl.edu/dbCAN2
GNU General Public License v3.0
138 stars 40 forks source link

Predict PULs with cgc.out? #62

Closed ehutchison2 closed 3 years ago

ehutchison2 commented 3 years ago

Thank you for the great tool! I ran run_dbcan using metagenomic sequencing data from fecal samples. For each sample, I provided the script with a file containing the assembled contigs (>500bp) and I enabled the CGCfinder tool. I have a cgc output file for each sample, but the cgc.out file does not seem to be in a human-readable format. How should this file be used or interpreted? Namely, I'm wondering if there is an easy way to predict PULs using the cgc.out file.

In addition, once I obtain PUL tables for my samples, can I consider the frequency of each PUL within each metagenome a quantitative measurement that would be comparable across samples? (This same question could be asked to CAZymes as well)

yinlabniu commented 3 years ago

image This is an example of cgc.out. We explained its format at http://bcb.unl.edu/dbCAN2/help.php (end of the page), but as the script generating this table has been updated a few times, it's not exactly the same now. Some more details can also be found at http://bcb.unl.edu/dbCAN_seq/help.php#cgc. About your question how to use cgc.out to predict PUL, this is an excellent question. Basically how to predict a substrate for the CGC, and when a substrate is predicted, a CGC will become a PUL. The answer is that this is a whole different task, a very difficult one, and we are working on it.

The next question about quantitative profiling the abundance/expression level of CGCs (putative PULs without substrate info), yes, you can map reads to CGCs to obtain something like a FPKM value, and that can be compared across samples. Again, this is another important question we are currently working on and hoping to develop a new tool for it.

Yanbin