Closed liuchen92 closed 2 years ago
Additionally, the pre-indexed database downloaded via
wget -c 'https://cloudstor.aarnet.edu.au/plus/s/vfKH9S8c5FVGBjV/download?path=%2F&files=ncbi_nt_no_env_11jun2019.zip'
was not
able to be unzipped with
error: invalid compressed data to inflate file
Hi!
Could you have a look at the header of the file you are trying to index? Either post an example or send part of the file to my email. Maybe the awk version did something to it that it wasn't supposed to?
I was not able to reproduce the unzipping error with the ncbi_nt_no_env_11jun2019.zip file. Your download might have failed, could you try again?
Thanks!
Hi!
Could you have a look at the header of the file you are trying to index? Either post an example or send part of the file to my email. Maybe the awk version did something to it that it wasn't supposed to?
I was not able to reproduce the unzipping error with the ncbi_nt_no_env_11jun2019.zip file. Your download might have failed, could you try again?
Thanks!
Thanks.
I solved the zip problems and it might due to changing name (I changed the name 'download?path=%2F&files=ncbi_nt_no_env_11jun2019.zip' to ncbi_nt_no_env_11jun2019.zip) .
The awk seems not necessary as the rename.py worked perfectly on nt database.
Hi,
I follow your instruction as below to process recently downloaded nt and nucl_gb.accession2taxid data.
Then used the rename.py to get the nt_w_taxid.fas used for kma database generation.
But when I tried to build a kma database using nt_w_taxid.fas via code like this
kma index -i nt_w_taxid.fas -o kma
It threw an error "unsupported file format"
And when I applied the rename.py script on nt directly instead of nt_sequential.fa , the kma database building procedure worked nicely.
Have no clues about whether converting the genbank fasta file to sequencial fasta is necessary or not.