ropensci / rentrez

talk with NCBI entrez using R
https://docs.ropensci.org/rentrez
Other
195 stars 38 forks source link

Entrez_fetch download sequences with variable start-stop lengths #122

Closed PWSmit closed 6 years ago

PWSmit commented 6 years ago

Hi Entrez team!

Is it possible to download sequences for a list of IDs with each their own seq_start and stop data? I would like to download a specific subset of DNA sequences of multiple IDs for further analysis.

I apply the following function (partly copied from somewhere..) but doesn't work ideal;

entrez <- function(a,b,c) entrez_fetch(db="nuccore", id= as.vector(a),seq_start = as.vector(b), seq_stop = as.vector(c), rettype="fasta") blast_sequence <- apply(dataset[,c('gi','start', 'end')], MARGIN= 1, function(y) entrez(y['gi'],y['start'], y['end']))

Would there be a better option that would comply with the NCBI restrictions?

Many thanks for super useful package!

Pieter

dwinter commented 6 years ago

Hi @PWSmit,

I don't think there is a way to pass variable seq_start and seq_stop values for a set of IDs, so the solution you've come up with might be the best you can do.

If you get yourself an API key and set it then rentrez will be a little bit faster than without the key. Of course, if the sequences are not huge, the other option is to download the full-length sequences then chop them up after downloading them.

Sorry I can't provide a better solution!

PWSmit commented 6 years ago

OK many thanks for your feedback!