Closed zdk123 closed 3 years ago
Ah, crud, I should have pinned biopython. 1.79 deprecating UnknownSeq strikes again. I'll fix this.
weird, I can't seem to rebase this PR on the current master branch. Maybe you can update it to be based on current master? That'll fix the biopython-related test failure.
I'll do that - I originally forked from DarianHole/ncbi-acc-download so that might explain it
@kblin rebased
Awesome, thanks. Apart from the wrong version number bump, things look good to me, thanks for the contribution! I'll fix the version number and cut a new release.
thanks - I can contribute the usage code over at secmet/mibig-json as well
This addresses #19, supplying an option to add genomic ranges to an accession download (e.g. the
from
andto
parameters in the request query string).For large records, this saves a substantial amount of time and bandwidth compared to downloading the whole thing and then subsetting.
Example usage:
While combining multiple accessions with a genomic range triggers an error:
Of course if you are picking arbitrary coordinates like this, it is sometimes the case you'll be in the middle of an ORF. While NCBI won't complain, certain downstream applications I've run into don't like this. Therefore I've also added a
correct
option in the--extended-validation
flag, that would filter these ORFs out. There's also a new unit test for the correction validator (note thatcorrect
does not get run whenall
is specified).