shenwei356 / gtdb-taxdump

GTDB taxonomy taxdump files with trackable TaxIds
MIT License
49 stars 2 forks source link

Failed to build Diamond-database with the taxonomy files #4

Closed emilhaegglund closed 1 year ago

emilhaegglund commented 2 years ago

Thanks for creating these taxonomy files. I was trying to use the files from R207 to build a Diamond database, however it failed when reading the names.dmp with the following error message: Failed to allocate sufficient memory. Please refer to the manual for instructions on memory usage..

I was just wondering if you have tried this taxonomy files with Diamond, and if you had any success? Cheers, Emil

shenwei356 commented 2 years ago

Seems it needs a lot of memory which is not available in your machine.

emilhaegglund commented 2 years ago

Strange, I'm on a machine with 128Gb Ram and building with the NCBI taxdumps where no problems. Will check with the Diamond-developers then.

Thanks, Emil

shenwei356 commented 2 years ago

GTDB taxonomy has 47894 species in r202 which belong to 28073+ species of NCBI taxonomy. Is this the cause?

emilhaegglund commented 2 years ago

Hi, I asked the developers of Diamond if they had any clue. The reason is that the taxids in GTDB-taxdump is larger than 2^31, which is not supported in Diamond. See https://github.com/bbuchfink/diamond/issues/611#issuecomment-1250972200. Will see if they can create a fix for this issue.

Thanks for the quick replies.

shenwei356 commented 2 years ago

It could happen, I use 'uint32' to store taxids.

shenwei356 commented 2 years ago

I plan to use int32.

shenwei356 commented 1 year ago

Check taxonkit v0.14.0