linnabrown / run_dbcan

Run_dbcan V4, using genomes/metagenomes/proteomes of any assembled organisms (prokaryotes, fungi, plants, animals, viruses) to search for CAZymes.
http://bcb.unl.edu/dbCAN2
GNU General Public License v3.0
140 stars 40 forks source link

Updated dbcan dependencies resulted in less cazy being detected with the hotpep tool. #37

Open Rob-murphys opened 4 years ago

Rob-murphys commented 4 years ago

I just updated the dependencies for dbcan today to what is on the readme document and re ran on the same set of genomes and everything worked fine. But upon inspection of the hotpep outputs there is now only 82 vs 201 prior to the update. What would have resulted in such a big difference?

linnabrown commented 4 years ago

Could you show me the previous database and dependencies you used? Thanks a lot. And if you can provide your command it will be fine.

Neato-Nick commented 4 years ago

This is an old issue so I don't know what the original version was, but today I upgraded from 2.0.6 to 2.0.11 and it did not decrease hotpep detection in my organism

tail -n +2 dbcan_results_2.0.11/cazyHotpep.out | cut -f 1 | sort | cut -c-2 | uniq -c
     12 AA
     89 CB
     18 CE
    146 GH
     51 GT
     14 PL
tail -n +2 dbcan_results_2.0.6/cazyHotpep.out | cut -f 1 | sort | cut -c-2 | uniq -c
     15 AA
     18 CE
    137 GH
     14 PL

Only superfamilies I "lost" results were for AA, and this was not straightforward because the classifications are a bit different

egrep "^AA" dbcan_results_2.0.11/cazyHotpep.out | cut -f 1,2 | sort | uniq -c | sed "1 i count\tCAZy_fam\tPPR_subfam" -
count   CAZy_fam        PPR_subfam
      1 AA1     4
      1 AA3     2
      2 AA3     8
      2 AA6     1
      5 AA6     2
      1 AA6     4
dbcan_results_2.0.6/cazyHotpep.out
count   CAZy_fam        PPR_subfam
      1 AA1     2
      3 AA3     1
      1 AA3     3
      2 AA6     3
      4 AA6     5
      4 AA8     3

And they're a bit different for other superfamilies too, really just at the subfamily level though...

egrep "^CE" dbcan_results_2.0.11/cazyHotpep.out | cut -f 1,2 | sort | uniq -c | sed "1 i count\tCAZy_fam\tPPR_subfam" -
count   CAZy_fam        PPR_subfam
      2 CE11    12
      1 CE4     1
      4 CE5     17
     11 CE8     30
dbcan_results_2.0.6/cazyHotpep.out
count   CAZy_fam        PPR_subfam
      1 CE11    14
      1 CE4     1
      1 CE4     95
      4 CE5     9
     11 CE8     22