shenwei356 / kmcp

Accurate metagenomic profiling && Fast large-scale sequence/genome searching
https://bioinf.shenwei.me/kmcp
MIT License
182 stars 13 forks source link

Availability of old gtdb databases? #41

Closed fplazaonate closed 1 year ago

fplazaonate commented 1 year ago

Hi @shenwei356,

I am performing some benchmarks and I was wondering if the gtdb r207 database was still available for download?

Best, Florian

shenwei356 commented 1 year ago

Yes, they are still there, follow the download page: https://1drv.ms/u/s!Ag89cZ8NYcqtjHwpe0ND3SUEhyrp?e=QDRbEC

path: kmcp/v2021.12/metagenomic-profiling

fplazaonate commented 1 year ago

Hi @shenwei356 , I have just reopened the issue. The gtdb v2021.12 database has 47,894 representative genomes. It corresponds to gtdb r202 not gtdb r207 (https://gtdb.ecogenomic.org/stats/r202). Do you have a prebuilt database for gtdb r207?

shenwei356 commented 1 year ago

Oh .... I see. Sorry, I don't think I have a prebuilt database for gtdb r207. But you can make one, it's easy.

Please follow these steps: https://bioinf.shenwei.me/kmcp/database/#gtdb . Please skip the step of "Masking prophage regions and removing plasmid sequences with genomad (optional)".

fplazaonate commented 1 year ago

# reference genomes are split into 10 chunks with 100bp overlap I think it is 150bp

fplazaonate commented 1 year ago

I successfully created the database. Thanks.