linnabrown / run_dbcan

Run_dbcan V4, using genomes/metagenomes/proteomes of any assembled organisms (prokaryotes, fungi, plants, animals, viruses) to search for CAZymes.
http://bcb.unl.edu/dbCAN2
GNU General Public License v3.0
146 stars 39 forks source link

How to summarize dbCAN3 results #172

Open NishatTamana51 opened 7 months ago

NishatTamana51 commented 7 months ago

I have run dbCAN3 for my fungal whole proteome data (and also for some other stain for this species) using HMMER, dbCAN_sub and DIAMOND tool. For the results, I kept those predicted by >=2 of these tools as suggested by dbCAN3. I want to do comparison among different strains of my fungal species by reporting the number of different CAZyme families and interpreting the differences. How can I do that? I mean, how can I count the CAZyme families?

linnabrown commented 7 months ago

I did not calculate abundance of CAZyme families before. This is my first guess. If I am wrong please correct me @yinlabniu .Our overview results provided families domain annotations for each protein sequence. If you provide multiple sequences from one genome of a species, just count the families with the same family name. If you are care about the subfamily, you should count subfamily. If not, you can just count the family.

Rundon-svg commented 3 weeks ago

I encountered a similar need to compare the differences in the number of genes among CAZyme major families and subfamilies across different species. This is a script for my project to aggregate and analyze these counts, helping to summarize and count the number of subfamilies and major families across multiple species. Hope it's helpful. https://github.com/Rundon-svg/dbCAN_Sum