Open ElinorSterner opened 1 year ago
I just had this same question! I ended up doing it outside of ncbi-genome-download
. I'm 95% sure my solution is correct :D
The genbank or refseq annotation_hashes.txt
file has a columns named "Features hash" and "Proteins name hash."
When the values of either of those columns are "D41D8CD98F00B204E9800998ECF8427E", that indicates the file does not exist.
Note the annotation_hashes.txt
file only exists under subsets of refseq/genbank
Example url, eg for all genbank plant genomes: https://ftp.ncbi.nlm.nih.gov/genomes/genbank/plant/annotation_hashes.txt
Hello, I want to check if GCA accessions that I pulled from genbank have CDSs, before filtering further to see if I want to download. I used the commands
--formats cds-fasta
to only look at CDS and-n
to check rather than download. However, -n it returns all the GCAs I input, not just ones with CDS files.I want it to check if a CDS exists without downloading yet, is there a way to do this?
thanks, Elinor