Closed Wanli-HE closed 1 year ago
here is the code: ./database_downloader.sh
This error actually comes from the script check_and_download_database.py. NCBI Entrez often fails when downloading numerous sequences at once, and doesn't try again after it failed. The new version of this script allows to define smaller batches when downloading complementary sequences, and to try again for sequences that weren't downloaded.
This error actually comes from the script check_and_download_database.py. NCBI Entrez often fails when downloading numerous sequences at once, and doesn't try again after it failed. The new version of this script allows to define smaller batches when downloading complementary sequences, and to try again for sequences that weren't downloaded.
ok, so i need to re-downloading it, until it no any errors raising?
If you have already run the script database_downloader.sh, then you should already have a few thousands of sequences in the file plasmid_refseq.fasta. If so, you can directly run the command python3 check_and_download_database.py download
. It will ask you for a few things to make sure all the sequences are downloaded. If you don't manage to get all the sequences, you should still get pretty good performances.
If you have deleted the file plasmid_refseq.fasta, then you should probably run database_downloader.sh again.
/mibi/Wanli/anaconda/envs/plasplinev1.4.1/lib/python3.9/site-packages/Bio/Entrez/Parser.py:903: UserWarning: Failed to save epost.dtd at /usr/local/home/hsv709/.config/biopython/Bio/Entrez/DTDs/epost.dtd warnings.warn("Failed to save %s at %s" % (filename, path)) Traceback (most recent call last): File "/mibi/users/Wanli/test_plasplinev1.4.1/Plaspline/db/db/plasforest/check_and_download_database.py", line 95, in
download_missing(list_missing, email)
File "/mibi/users/Wanli/test_plasplinev1.4.1/Plaspline/db/db/plasforest/check_and_download_database.py", line 77, in download_missing
result = Entrez.read(request)
File "/mibi/Wanli/anaconda/envs/plasplinev1.4.1/lib/python3.9/site-packages/Bio/Entrez/init.py", line 508, in read
record = handler.read(handle)
File "/mibi/Wanli/anaconda/envs/plasplinev1.4.1/lib/python3.9/site-packages/Bio/Entrez/Parser.py", line 304, in read
self.parser.ParseFile(handle)
File "/home/conda/feedstock_root/build_artifacts/python-split_1653669926144/work/Modules/pyexpat.c", line 459, in EndElement
File "/mibi/Wanli/anaconda/envs/plasplinev1.4.1/lib/python3.9/site-packages/Bio/Entrez/Parser.py", line 666, in endErrorElementHandler
raise RuntimeError(value)
RuntimeError: Some IDs have invalid value and were omitted. Maximum ID value 18446744073709551615