Closed BenjaminJPerry closed 1 year ago
I tried using the latets release of makeblastdb an had the same error,
inscrutable$ makeblastdb -version
makeblastdb: 2.13.0+
Package: blast 2.13.0, build Jul 18 2022 22:49:37
inscrutable$ makeblastdb -in GTDB-latest.fna -input_type fasta -dbtype nucl -taxid_map taxid.map -parse_seqids -out GTDB-r207
Building a new DB, current time: 11/24/2022 09:38:13
New DB name: /bifo/scratch/2022-BJP-GTDB_Benchmarking/gtdb-latest/GTDB-r207
New DB title: GTDB-latest.fna
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 3000000000B
Error: NCBI C++ Exception:
T0 "/opt/conda/conda-bld/blast_1658184301332/work/blast/c++/src/corelib/ncbistr.cpp", line 640: Error: (CStringException::eConvert) ncbi::NStr::StringToInt() - Cannot convert string '2988443261' to int, overflow (m_Pos = 0)
We hashed the taxon name (in lower case) of each taxon node to uint64 using xxhash and converted it to uint32
(max value: (1<<32) - 1 = 4294967295
). While it looks like more than one tool (https://github.com/shenwei356/gtdb-taxdump/issues/4) stores a taxid as an int32
(max value: (1<<31) - 1 = 2147483647
).
It's time for change.
Just updated the code. Please test it.
$ grep GCF_000980105.1 gtdb-taxdump/R207/taxid.map
GCF_000980105.1 840959613
I'll update https://github.com/shenwei356/gtdb-taxdump later.
Tagged a new release: v0.14.0
Hello Wei Shen,
This is not strictly an error with taxonkit create-taxdump, but more of a feature request?
I'm trying to use the taxid.map generated using taxonkit create-taxdump for the GTDB database (r207) when making a blastn database of the complete set of GTDB representative genomes (r207).
Making the taxdump using taxonkit,
Using it to make the blast database (where the error occurs),
In the taxid.map generated with taxonkit ,
It seems like the size of the value is too large for makeblastdb to handle when building?
It may be more of an issue with makeblastdb, but I thought I would pass it on as it might be an easy fix in taxonkit 😋
Thank you for all the excellent bioinformatic software 🥇 😁
Ben