Open NishatTamana51 opened 7 months ago
I did not calculate abundance of CAZyme families before. This is my first guess. If I am wrong please correct me @yinlabniu .Our overview results provided families domain annotations for each protein sequence. If you provide multiple sequences from one genome of a species, just count the families with the same family name. If you are care about the subfamily, you should count subfamily. If not, you can just count the family.
I encountered a similar need to compare the differences in the number of genes among CAZyme major families and subfamilies across different species. This is a script for my project to aggregate and analyze these counts, helping to summarize and count the number of subfamilies and major families across multiple species. Hope it's helpful. https://github.com/Rundon-svg/dbCAN_Sum
I have run dbCAN3 for my fungal whole proteome data (and also for some other stain for this species) using HMMER, dbCAN_sub and DIAMOND tool. For the results, I kept those predicted by >=2 of these tools as suggested by dbCAN3. I want to do comparison among different strains of my fungal species by reporting the number of different CAZyme families and interpreting the differences. How can I do that? I mean, how can I count the CAZyme families?