Closed Jigyasa3 closed 2 years ago
When I try datasets
on an individual assembly file from one of the above mentioned BioProjects, then it throws an error-
Code used-
/flash/BourguignonU/Tool/datasets download genome accession GCA_011075415.1 --filename GCA_011075415.1.zip --exclude-gff3 --exclude-protein --exclude-rna
Error-
Some of the assemblies provided ('GCA_011075415.1') are valid NCBI Assembly Accessions,
but are not in scope for NCBI Datasets.
Error: Input accessions not specified
Is datasets
specific for some BioProjects only?
Hi Jigyasa,
Thanks for your feedback.
Both of the BioProjects (PRJNA508395 and PRJEB22522) and the Assembly record (GCA_011075415.1) that you are looking for describe metagenomic assemblies and we currently exclude metagenomes from NCBI Datasets. But you’re not the first person to ask for this data! Based on user feedback such as yours, we are planning to add these genomes to NCBI Datasets, tentatively in the next few months.
You can find more information about which genomes are included and which genomes are excluded from NCBI Datasets here: NCBI Datasets Available genomes
Best, Eric
Eric Cox, PhD [Contractor] (he/him/his) NCBI Datasets Sequence Enhancements, Tools and Delivery (SeqPlus) NIH/NLM/NCBI eric.cox@nih.gov
Hey @ericcox1
Thank you so much for replying and directing me to the correct resource for dataset
usage. Looking forward to using this resource on metagenomes!
On a similar note, I wanted to ask if I want to batch download assemblies from multiple Bioprojects, is there a currently present tool for that? I have a list of >100 BioProjects that I want to examine and really don't want to manually download the data.
Looking forward to your advice!
You're welcome.
To download data for assemblies from multiple BioProjects, you can provide a list of BioProject accessions as follows:
datasets download genome accession --inputfile bioproject_list.txt
Hi Jigyasa,
I wanted to let you know that metagenomes are now available in Datasets. Querying by the BioProject (PRJNA508395, PRJEB22522) and Assembly (GCA_011075415.1) accessions you mentioned above will now return the complete set of genome data.
You can also see these metagenomes on our Genomes page, for example, for PRJEB22522.
I’m going to close this issue for now. Please let me know if you have any other feedback.
Best, Eric
Hey all!
Thank you for this great resource! I am interested in downloading assemblies from a large list of Bioproject ids using
dataset
. When I try running it on a single Bioproject id,dataset
cannot find the assembly even though assemblies are associated with the Bioproject on NCBI.code used-
/flash/BourguignonU/Tool/datasets download genome accession PRJNA508395 --filename PRJNA508395.zip
/flash/BourguignonU/Tool/datasets download genome accession PRJEB22522 --filename PRJEB22522.zip
When I try the example code in the tutorial,
/flash/BourguignonU/Tool/datasets download genome accession PRJEB35331 --filename test.zip
I get the zip file, showing that the code should work.The two BioProjects have assemblies associated with them