vrmarcelino / CCMetagen

Microbiome classification pipeline
GNU General Public License v3.0
64 stars 19 forks source link

download the "ncbi_nt_no_env_11jun2019" database error #50

Closed hehanhhh closed 1 year ago

hehanhhh commented 1 year ago

Hello. When I try to download the "ncbi_nt_no_env_11jun2019" database, it keeps prompting me that the file is corrupted and has CRC checksum errors when I decompress it, and I have changed the download tool and decompression tool, but it is still the same, is there any solution? Thank you!

vrmarcelino commented 1 year ago

Hi, Apologies for the late reply. We have had issues with the mirrored files being corrupted in cloudstor, so the download would work some times, and not others. So I suspect this is happening again. I am looking into solutions for a new place to store it, but that migth take a few months, sorry. For now the only thing that migth help is to try again. I can also help you build your own custom database if that helps.

hehanhhh commented 1 year ago

@vrmarcelino Thank you for your reply. Is there any way to speed up building a local database? We are following the tutorial very slowly and it is very memory intensive.

vrmarcelino commented 1 year ago

Hi! Is this the KMA step? You'd probably need a server with ~500Gb of memory.

hehanhhh commented 1 year ago

@vrmarcelino Thanks, so far I have successfully downloaded the the "ncbi_nt_no_env_11jun2019" database, but do I need to use the kma index for this file? If kma index is required, then what file is used as input file? If kma index is not needed, I use kma -ipe /clusterfs/ZH11_FDSW220008794-2r_1.clean.fq.gz /clusterfs/ZH11_FDSW220008794-2r_2.clean.fq.gz -o /clusterfs/ ZH11_kma -t_db /clusterfs/ncbi_nt_no_env_11jun2019 -t 30 -1t1 -mem_mode -and -apm f -ef testing kma but it stays in running state without outputting results, any solution? Thanks a lot!

vrmarcelino commented 1 year ago

Hi!

Great that the download worked!

This database is already indexed so you can proceed directly to the kma mapping step.

How much memory are you giving to KMA? It might be stuck without memory.

I've also noticed some spaces in the command. I'd also recommend using less cores (4 is usually good). Assuming you are in a folder where you have another folder called 'clusterfs', maybe try:

kma -ipe clusterfs/ZH11_FDSW220008794-2r_1.clean.fq.gz clusterfs/ZH11_FDSW220008794-2r_2.clean.fq.gz -o clusterfs/ZH11_kma -t_db clusterfs/ncbi_nt_no_env_11jun2019 -t 4 -1t1 -mem_mode -and -apm f -ef