linnabrown / run_dbcan

Run_dbcan V4, using genomes/metagenomes/proteomes of any assembled organisms (prokaryotes, fungi, plants, animals, viruses) to search for CAZymes.
http://bcb.unl.edu/dbCAN2
GNU General Public License v3.0
138 stars 40 forks source link

Errors in database #133

Closed Russel88 closed 9 months ago

Russel88 commented 11 months ago

Hi

There seems to be some errors in the database for 4.0.0. When downloading the database files the PUL.faa file, which is used for blastp, contains loci up to PUL0602, but in the dbCAN-PUL.tar.gz there is up to PUL0612. And in dbCAN-PUL.substrate.mapping.xls there is up to PUL0656. Why are the last ones left out? Also, there are mismatches in dbCAN-PUL.tar.gz and dbCAN-PUL.substrate.mapping.xls. For PUL0608 the xls file says it should have both a GH112 and a GH136, but in dbCAN-PUL.tar.gz there is only a GH136. This is also a problem with other PUL

Database downloaded August 2nd, 2023

ZhengJinfang1220 commented 11 months ago

Hi

There seems to be some errors in the database for 4.0.0. When downloading the database files the PUL.faa file, which is used for blastp, contains loci up to PUL0602, but in the dbCAN-PUL.tar.gz there is up to PUL0612. And in dbCAN-PUL.substrate.mapping.xls there is up to PUL0656. Why are the last ones left out? Also, there are mismatches in dbCAN-PUL.tar.gz and dbCAN-PUL.substrate.mapping.xls. For PUL0608 the xls file says it should have both a GH112 and a GH136, but in dbCAN-PUL.tar.gz there is only a GH136. This is also a problem with other PUL

Database downloaded August 2nd, 2023

Thank you for pointing out this issue. The substrate prediction is developed on the raw version dbCAN-PUL only containing 602 PULs. And then dbCAN-PUL updated twice. We are working on updating the databases.

Russel88 commented 11 months ago

Ok. Thank you for quick response!

ZhengJinfang1220 commented 11 months ago

Ok. Thank you for quick response!

Thanks for your patience, we have updated the database. We also are working on a protocol that was submitted to nature protocols. The presubmission has been accepted. And, we are preparing for the full manuscript. We are going to develop more functions for the protocol. An updated version will be released before the end of next month.

azureycy commented 9 months ago

Hi

There seems to be some errors in the database for 4.0.0. When downloading the database files the PUL.faa file, which is used for blastp, contains loci up to PUL0602, but in the dbCAN-PUL.tar.gz there is up to PUL0612. And in dbCAN-PUL.substrate.mapping.xls there is up to PUL0656. Why are the last ones left out? Also, there are mismatches in dbCAN-PUL.tar.gz and dbCAN-PUL.substrate.mapping.xls. For PUL0608 the xls file says it should have both a GH112 and a GH136, but in dbCAN-PUL.tar.gz there is only a GH136. This is also a problem with other PUL

Database downloaded August 2nd, 2023

Hi, we have updated the files including PUL_12112023.faa, dbCAN-PUL.substrate.mapping.xls and dbCAN-PUL.tar.gz that contain PUL0001 to PUL0673.

For the mismatches in dbCAN-PUL.tar.gz and dbCAN-PUL.substrate.mapping.xls, this is due to the results generated by different dbCAN version, now the dbCAN-PUL.tar.gz has been updated to be consistent with the mapping table.

linnabrown commented 9 months ago

Problem resolved.