turbomam / biosample-xmldb-sqldb

Tools for loading NCBI Biosample into an XML database and then transforming that into a SQL database
MIT License
0 stars 1 forks source link

Explain that NCBI's XML is chunked because there's a cap on the number of nodes in a BaseX database. #9

Closed turbomam closed 6 months ago

turbomam commented 6 months ago

max nodes/database ~ 2 billion

https://docs.basex.org/wiki/Statistics

turbomam commented 6 months ago

We currently split into ~ 35 chunks/databases with ~ 150 million nodes/database