ncbi / datasets

NCBI Datasets is a new resource that lets you easily gather data from across NCBI databases.
https://www.ncbi.nlm.nih.gov/datasets
Other
355 stars 39 forks source link

Partial sequence download #368

Closed mkdevesh closed 4 months ago

mkdevesh commented 4 months ago

Hi, I tried to download the genome dataset for Bacterial species but noticed for Mycoplasmoides genitalium , it was not returning the correct number of genomes when cross checked to the Genome Dataset web page. The version for datasets is 16.17.3 which is the latest.

I used this code

datasets download genome taxon 2097 --assembly-level chromosome --assembly-source genbank --filename ncbi_dataset_MG.zip

it is giving only 1 genome whereas it is supposed to be 7 at chromosome level.

The link to genome dataset: (https://www.ncbi.nlm.nih.gov/datasets/genome/?taxon=2097&assembly_level=2:3) It is doing the same for taxon 2098. I am afraid it might be the case for other species as well. Is there any other way I should do this or it is due to some issue in the database?

olearyna commented 4 months ago

Hi,

Thank you for opening this issue and providing the link to the genome list. The issue arises because six of the genomes have an assembly level of 'complete' and one has an assembly level of 'chromosome'. You can refer to the documentation page for more details on assembly levels: NCBI Assembly Levels.

To retrieve all genomes, you can filter your query by specifying both 'complete' and 'chromosome' assembly levels.

Use the following command to download all seven genomes: datasets download genome taxon 2097 --assembly-level chromosome,complete_genome --assembly-source genbank --filename ncbi_dataset_MG.zip

If you have any other questions, please let me know.

Best regards,

Nuala