qiyunlab / HGTector

HGTector2: Genome-wide prediction of horizontal gene transfer based on distribution of sequence homology patterns.
BSD 3-Clause "New" or "Revised" License
131 stars 35 forks source link

Cannot download data from NCBI refseq #98

Closed YimingLiu000 closed 1 year ago

YimingLiu000 commented 2 years ago

Hi,

I am using HGTector2 and constructing a new database with the following command: hgtector database -t 9606 -o ./Hominidae --compile diamond --threads 12 --retries 5 --delay 1 --timeout 5

Then the following sentence showed on the screen without change and the download file kept empty. Database building started at 2022-08-08 23:00:10.549497. Downloading NCBI taxonomy database...

After KeyboardInterrupt, the result showed: CTraceback (most recent call last): File "/home/kuihuadufu/miniconda3/envs/hgtector/bin/hgtector", line 96, in main() File "/home/kuihuadufu/miniconda3/envs/hgtector/bin/hgtector", line 35, in main module(args) File "/home/kuihuadufu/miniconda3/envs/hgtector/lib/python3.10/site-packages/hgtector/database.py", line 124, in call self.retrieve_taxdump() File "/home/kuihuadufu/miniconda3/envs/hgtector/lib/python3.10/site-packages/hgtector/database.py", line 263, in retrieve_taxdump run_command(f'rsync -Ltq {server}/{rfile} {self.down}') File "/home/kuihuadufu/miniconda3/envs/hgtector/lib/python3.10/site-packages/hgtector/util.py", line 237, in run_command res = subprocess.run( File "/home/kuihuadufu/miniconda3/envs/hgtector/lib/python3.10/subprocess.py", line 503, in run stdout, stderr = process.communicate(input, timeout=timeout) File "/home/kuihuadufu/miniconda3/envs/hgtector/lib/python3.10/subprocess.py", line 1139, in communicate stdout = self.stdout.read() KeyboardInterrupt

I have no idea about this situation. Any help or suggestions you may have would be greatly appreciated.

qiyunzhu commented 2 years ago

Hello @YimingLiu000 Thanks for your interest in our program. This scenario may be because the Internet connection to the NCBI FTP server is broken. To check, can you run the following command in a terminal?

wget https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz

PS: I don't think a database of human sequences will be useful for HGT detection... Unless you just wanted to test if the program runs.

YimingLiu000 commented 2 years ago

Yes, I can run it and download this file successfully. Then I put it in ./download/. I also successfully downloaded assembly_summary_refseq.txt by the following command: wget -c https://ftp.ncbi.nlm.nih.gov/genomes/refseq/assembly_summary_refseq.txt

When I ran this command, the same error showed again. Maybe there is something wrong with rsync on my computer. I will appreciate it if you can provide another way to batch download.

hgtector database --output . --cats viral --sample 1 --rank order --compile diamond

Database building started at 2022-08-09 12:10:44.948725. Using local file taxdump.tar.gz. Reading NCBI taxonomy database... done. Total number of TaxIDs: 2437951. Using local file assembly_summary_refseq.txt. Reading RefSeq assembly summary... done. Total number of genomes: 269699. Genome categories: viral Downloading genome list per RefSeq category... ^CTraceback (most recent call last): File "/home/kuihuadufu/miniconda3/envs/hgtector/bin/hgtector", line 96, in main() File "/home/kuihuadufu/miniconda3/envs/hgtector/bin/hgtector", line 35, in main module(args) File "/home/kuihuadufu/miniconda3/envs/hgtector/lib/python3.10/site-packages/hgtector/database.py", line 130, in call self.retrieve_categories() File "/home/kuihuadufu/miniconda3/envs/hgtector/lib/python3.10/site-packages/hgtector/database.py", line 364, in retrieve_categories asmset = set(get_categories('RefSeq')) File "/home/kuihuadufu/miniconda3/envs/hgtector/lib/python3.10/site-packages/hgtector/database.py", line 348, in get_categories run_command(f'rsync -Ltq {server}/{rfile} {self.tmpdir}') File "/home/kuihuadufu/miniconda3/envs/hgtector/lib/python3.10/site-packages/hgtector/util.py", line 237, in run_command res = subprocess.run( File "/home/kuihuadufu/miniconda3/envs/hgtector/lib/python3.10/subprocess.py", line 503, in run stdout, stderr = process.communicate(input, timeout=timeout) File "/home/kuihuadufu/miniconda3/envs/hgtector/lib/python3.10/subprocess.py", line 1139, in communicate stdout = self.stdout.read() KeyboardInterrupt

qiyunzhu commented 2 years ago

Hello @YimingLiu000 Sorry for the late response. I guess this section will help. It generates a list of URLs and let you manually download them. If rsync does not work, you may try wget or other approaches.